intro
Learn how to use the REGEXP_MATCH SQL function in PostgreSQL to perform complex string matching with regular expression patterns.
The ability to search for specific strings and extract meaningful data from them is paramount when dealing with data. Regular expressions serve as indispensable tools in this process, providing a flexible and powerful way to define patterns for string matching and manipulation. Here is why PostgreSQL offers several options for that, including the REGEXP_MATCH
SQL function.
In this guide, you will delve into REGEXP_MATCH
, seeing its syntax, applications, and nuances. Become the master of string matching in PostgreSQL!
What Is the REGEXP_MATCH SQL Function in PostgreSQL?
The REGEXP_MATCH
SQL function in PostgreSQL accepts a string and a POSIX regular expression pattern. If you are not familiar with this concept, POSIX regular expressions (a.k.a. “regexps”) follow a standardized pattern-matching syntax to represent sets of strings. In particular, they enable sophisticated string matching based on patterns.
The PostgreSQL REGEXP_MATCH
function returns an array containing the substrings corresponding to the first match of a regular expression pattern to the given input string. That is not easy to get at first, so let's better understand how this function works in the next section!
How REGEXP_MATCH Works in Postgres
The REGEXP_MATCH
SQL function in PostgreSQL has the following syntax:
1
regexp_match(input_string, posix_pattern [, flags ])
Where the arguments are:
As mentioned before, REGEXP_MATCH
searches posix_pattern
in input_string
and returns a text array containing the substrings corresponding to the first match. Take a look at the example below:
1
SELECT regexp_match('transportation', 'p.*rt');
This will produce:
1
{'port'}
The reason is that the pattern 'p.*rt'
, for words like “port,” “part,” and others, matches the substring “port” in the input string “transportation.”
Specifically, the PostgreSQL REGEXP_MATCH
function returns:
Option #1: NULL
, if there is no match:
1
SELECT regexp_match('The "Air-Max" sneakers are simply the best shoes on the market', '[0-9.]+');
2
-- NULL
Option #2: A single-element text array containing the substring matching the whole pattern, if a match is found and the contains no parenthesized sub-expressions:
1
SELECT regexp_match('The price of the "Air-Max" sneakers is $499.99', '[0-9.]+');
2
-- {'499.99'}
Option #3: A text array with each element corresponding to a sub-expression match, if a match is found and the pattern contains more than one parenthesized sub-expressions:
1
SELECT regexp_match('The price of the "Air-Max" sneakers is $499.99', '"(.*)\\-(.*)"');
2
-- {'Air', 'Max'}
To see what results a regular expression produces and why, you can test it on websites like regex101 before using it in your query:
Note: If you want to get the set of text arrays containing all matching substrings and not only the array with the first match, use the PostgreSQL REGEXP_MATCHES
function instead.
REGEXP_MATCH SQL Function: Use Cases
The REGEXP_MATCH
SQL function offered by PostgreSQL is a powerful tool for performing advanced string matching and extraction in your queries. No wonder, the scenarios where this function can come to the rescue are countless.
Here, you will see the three most popular REGEXP_MATCH
use cases.
The following queries will be executed in DbVisualizer, the PostgreSQL database client with the highest user satisfaction on the market!
Prefix Phone Number Extraction
Suppose you have some text that may contain a phone number. The goal is to retrieve it from the text. This is a common scenario that generally requires application-level logic. Thanks to the REGEXP_MATCH
PostgreSQL function, you can achieve that with this POSIX regexp for North American phone numbers:
1
([0-9]{3})[-.●]?[0-9]{3}[-.●]?[0-9]{4}
Use the above regular expression to get the phone number in an array as below:
1
SELECT regexp_match('Contact us at 987-654-3210', '[0-9]{3}[-.●]?[0-9]{3}[-.●]?[0-9]{4}') AS phone_number;
The result will be:
1
{'987-654-3210'}
Great! That is exactly the phone number you were looking for!
Now, what if the text contained more than one phone number? You might consider setting the 'g'
flag to look for all matches and not just the first one, as follows:
1
SELECT regexp_match('Contact us at 987-654-3210 or 555-872-2310', '[0-9]{3}[-.●]?[0-9]{3}[-.●]?[0-9]{4}', 'g') AS phone_numbers;
In this case, you will get this error:
1
[Code: 0, SQL State: 22023] ERROR: regexp_match() does not support the "global" option. Hint: Use the regexp_matches function instead.
The reason is that the REGEXP_MATCH
SQL function does not support the 'g'
(“global”) option. Instead, you need to use the PostgreSQL REGEXP_MATCH
function:
1
SELECT regexp_matches('Contact us at 987-654-3210 or 555-872-2310', '[0-9]{3}[-.●]?[0-9]{3}[-.●]?[0-9]{4}', 'g') AS phone_numbers;
This time, you will get both phone numbers as desired:
Email Retrieval
E-mail addresses are valuable information that you want to distinguish from other text information. To extract them, you can use a regexp with REGEXP_MATCH
as below:
1
SELECT regexp_match('Contact us at info@company.com', '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}') AS email;
This will return:
1
{'info@company.com'}
Keep in mind that you generally want the data to be returned as a string and not as a text array as it is now. So, you usually have to get the first element of the PostgreSQL array with the [1]
syntax:
1
SELECT regexp_match('Contact us at info@company.com', '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}')[1] AS email;
This time, you will get:
1
'info@company.com'
![Note that wrapping round parenthesis to apply "1]" correctly
Note that wrapping round parenthesis to apply "[1]" correctly
Note that the above query is equivalent to:
1
SELECT regexp_substr('Contact us at info@company.com', '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}') AS email;
The problem with the REGEXP_SUBSTR
function is that it is only available from PostgreSQL 15, while REGEXP_MATCH
is available from PostgreSQL 10. Here is why most users still use REGEXP_MATCH
.
URL Extraction
To extract the URL from a string in PostgreSQL, use REGEXP_MATCH
with the following regexp:
1
http[s]?:\\/\\/[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}
Apply it as in this example:
1
SELECT regexp_match('Visit our website at <HTTPS://EXAMPLE.COM> for more information', 'http[s]?:\\/\\/[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}' , 'i') AS url;
Notice the 'i'
flag for case-insensitive regular expression matching.
Awesome! You are now a PostgreSQL REGEXP_MATCH
expert!
PostgreSQL REGEXP_MATCH: Best Practices
This is a list of best practices for using the REGEXP_MATCH
like a pro:
Conclusion
In this guide, you understood what the REGEXP_MATCH
function is in PostgreSQL and how it works. You now know that it is a powerful tool to apply a regular expression for string matching. Thanks to the use cases shown here, you have also learned how to use the function in real-world scenarios.
REGEXP_MATCH
is a complex function that requires a powerful database client. Which is the best based on user satisfaction? DbVisualizer! This full-featured database client supports several DBMS technologies, offers advanced query optimization capabilities, and can generate ERD-like schemas. Try DbVisualizer for free today!
FAQ
What is a POSIX regular expression?
A POSIX regular expression, also known as “regexp,” is a standardized pattern-matching syntax used in computing to search for a set of strings. It empowers advanced text searching and manipulation by supporting rules for matching characters or character sequences within strings. That means POSIX regexps are a powerful tool for tasks such as string matching, extraction, and validation.
What databases support the REGEXP_MATCH SQL function?
Among popular databases, the REGEXP_MATCH
SQL function is supported only by PostgreSQL and SingleStoreDB. Since it is not part of the standard SQL specification, each database technology may implement it differently.
What is the difference between the PostgreSQL REGEXP_MATCHES and REGEXP_MATCH functions?
The main difference between the PostgreSQL REGEXP_MATCHES
and REGEXP_MATCH
functions lies in their output. REGEXP_MATCHES
returns a set of text arrays, providing all matches of a regular expression pattern in a string. In contrast, REGEXP_MATCH
returns a single-element text array for the first match, or NULL
if there is no match.
What is the difference between the PostgreSQL REGEXP_MATCH and REGEXP_LIKE functions?
The differences between the PostgreSQL REGEXP_MATCH
and REGEXP_LIKE
functions are in their purpose and syntax. REGEXP_MATCH
returns a text array of matching substrings within the first match of a regular expression pattern, allowing extraction of specific substrings. On the other hand, REGEXP_LIKE
is a boolean function that checks if a string matches a specified regular expression pattern, returning true
or false
.
Should you use the REGEXP_SUBSTR function or the REGEXP_MATCH SQL function in PostgreSQL?
The choice between REGEXP_SUBSTR
and REGEXP_MATCH
in PostgreSQL depends on the desired outcome. If you need the whole matching substring or NULL
for no match, REGEXP_SUBSTR
is preferable. However, it is only available from PostgreSQL version 15 onwards. In older versions, you can get the same result by accessing the first element of REGEXP_MATCH
's result array.