intro
Learn how to use the REGEXP_MATCH SQL function in PostgreSQL to perform complex string matching with regular expression patterns.
The ability to search for specific strings and extract meaningful data from them is paramount when dealing with data. Regular expressions serve as indispensable tools in this process, providing a flexible and powerful way to define patterns for string matching and manipulation. Here is why PostgreSQL offers several options for that, including the REGEXP_MATCH SQL function.
In this guide, you will delve into REGEXP_MATCH, seeing its syntax, applications, and nuances. Become the master of string matching in PostgreSQL!
What Is the REGEXP_MATCH SQL Function in PostgreSQL?
The REGEXP_MATCH SQL function in PostgreSQL accepts a string and a POSIX regular expression pattern. If you are not familiar with this concept, POSIX regular expressions (a.k.a. “regexps”) follow a standardized pattern-matching syntax to represent sets of strings. In particular, they enable sophisticated string matching based on patterns.
The PostgreSQL REGEXP_MATCH function returns an array containing the substrings corresponding to the first match of a regular expression pattern to the given input string. That is not easy to get at first, so let's better understand how this function works in the next section!
How REGEXP_MATCH Works in Postgres
The REGEXP_MATCH SQL function in PostgreSQL has the following syntax:
1
regexp_match(input_string, posix_pattern [, flags ])
Where the arguments are:
As mentioned before, REGEXP_MATCH searches posix_pattern in input_string and returns a text array containing the substrings corresponding to the first match. Take a look at the example below:
1
SELECT regexp_match('transportation', 'p.*rt');
This will produce:
1
{'port'}
The reason is that the pattern 'p.*rt', for words like “port,” “part,” and others, matches the substring “port” in the input string “transportation.”
Specifically, the PostgreSQL REGEXP_MATCH function returns:
Option #1: NULL, if there is no match:
1
SELECT regexp_match('The "Air-Max" sneakers are simply the best shoes on the market', '[0-9.]+');
2
-- NULL
Option #2: A single-element text array containing the substring matching the whole pattern, if a match is found and the contains no parenthesized sub-expressions:
1
SELECT regexp_match('The price of the "Air-Max" sneakers is $499.99', '[0-9.]+');
2
-- {'499.99'}
Option #3: A text array with each element corresponding to a sub-expression match, if a match is found and the pattern contains more than one parenthesized sub-expressions:
1
SELECT regexp_match('The price of the "Air-Max" sneakers is $499.99', '"(.*)\\-(.*)"');
2
-- {'Air', 'Max'}
To see what results a regular expression produces and why, you can test it on websites like regex101 before using it in your query:

Note: If you want to get the set of text arrays containing all matching substrings and not only the array with the first match, use the PostgreSQL REGEXP_MATCHES function instead.
REGEXP_MATCH SQL Function: Use Cases
The REGEXP_MATCH SQL function offered by PostgreSQL is a powerful tool for performing advanced string matching and extraction in your queries. No wonder, the scenarios where this function can come to the rescue are countless.
Here, you will see the three most popular REGEXP_MATCH use cases.
The following queries will be executed in DbVisualizer, the PostgreSQL database client with the highest user satisfaction on the market!
Prefix Phone Number Extraction
Suppose you have some text that may contain a phone number. The goal is to retrieve it from the text. This is a common scenario that generally requires application-level logic. Thanks to the REGEXP_MATCH PostgreSQL function, you can achieve that with this POSIX regexp for North American phone numbers:
1
([0-9]{3})[-.●]?[0-9]{3}[-.●]?[0-9]{4}
Use the above regular expression to get the phone number in an array as below:
1
SELECT regexp_match('Contact us at 987-654-3210', '[0-9]{3}[-.●]?[0-9]{3}[-.●]?[0-9]{4}') AS phone_number;
The result will be:
1
{'987-654-3210'}

Great! That is exactly the phone number you were looking for!
Now, what if the text contained more than one phone number? You might consider setting the 'g' flag to look for all matches and not just the first one, as follows:
1
SELECT regexp_match('Contact us at 987-654-3210 or 555-872-2310', '[0-9]{3}[-.●]?[0-9]{3}[-.●]?[0-9]{4}', 'g') AS phone_numbers;
In this case, you will get this error:
1
[Code: 0, SQL State: 22023] ERROR: regexp_match() does not support the "global" option. Hint: Use the regexp_matches function instead.

The reason is that the REGEXP_MATCH SQL function does not support the 'g' (“global”) option. Instead, you need to use the PostgreSQL REGEXP_MATCH function:
1
SELECT regexp_matches('Contact us at 987-654-3210 or 555-872-2310', '[0-9]{3}[-.●]?[0-9]{3}[-.●]?[0-9]{4}', 'g') AS phone_numbers;
This time, you will get both phone numbers as desired:

Email Retrieval
E-mail addresses are valuable information that you want to distinguish from other text information. To extract them, you can use a regexp with REGEXP_MATCH as below:
1
SELECT regexp_match('Contact us at info@company.com', '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}') AS email;
This will return:
1
{'info@company.com'}
Keep in mind that you generally want the data to be returned as a string and not as a text array as it is now. So, you usually have to get the first element of the PostgreSQL array with the [1] syntax:
1
SELECT regexp_match('Contact us at info@company.com', '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}')[1] AS email;
This time, you will get:
1
'info@company.com'
![Note that wrapping round parenthesis to apply "1]" correctly
Note that wrapping round parenthesis to apply "[1]" correctly
Note that the above query is equivalent to:
1
SELECT regexp_substr('Contact us at info@company.com', '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}') AS email;
The problem with the REGEXP_SUBSTR function is that it is only available from PostgreSQL 15, while REGEXP_MATCH is available from PostgreSQL 10. Here is why most users still use REGEXP_MATCH.
URL Extraction
To extract the URL from a string in PostgreSQL, use REGEXP_MATCH with the following regexp:
1
http[s]?:\\/\\/[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}
Apply it as in this example:
1
SELECT regexp_match('Visit our website at <HTTPS://EXAMPLE.COM> for more information', 'http[s]?:\\/\\/[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}' , 'i') AS url;

Notice the 'i' flag for case-insensitive regular expression matching.
Awesome! You are now a PostgreSQL REGEXP_MATCH expert!
PostgreSQL REGEXP_MATCH: Best Practices
This is a list of best practices for using the REGEXP_MATCH like a pro:
Conclusion
In this guide, you understood what the REGEXP_MATCH function is in PostgreSQL and how it works. You now know that it is a powerful tool to apply a regular expression for string matching. Thanks to the use cases shown here, you have also learned how to use the function in real-world scenarios.
REGEXP_MATCH is a complex function that requires a powerful database client. Which is the best based on user satisfaction? DbVisualizer! This full-featured database client supports several DBMS technologies, offers advanced query optimization capabilities, and can generate ERD-like schemas. Try DbVisualizer for free today!
FAQ
What is a POSIX regular expression?
A POSIX regular expression, also known as “regexp,” is a standardized pattern-matching syntax used in computing to search for a set of strings. It empowers advanced text searching and manipulation by supporting rules for matching characters or character sequences within strings. That means POSIX regexps are a powerful tool for tasks such as string matching, extraction, and validation.
What databases support the REGEXP_MATCH SQL function?
Among popular databases, the REGEXP_MATCH SQL function is supported only by PostgreSQL and SingleStoreDB. Since it is not part of the standard SQL specification, each database technology may implement it differently.
What is the difference between the PostgreSQL REGEXP_MATCHES and REGEXP_MATCH functions?
The main difference between the PostgreSQL REGEXP_MATCHES and REGEXP_MATCH functions lies in their output. REGEXP_MATCHES returns a set of text arrays, providing all matches of a regular expression pattern in a string. In contrast, REGEXP_MATCH returns a single-element text array for the first match, or NULL if there is no match.
What is the difference between the PostgreSQL REGEXP_MATCH and REGEXP_LIKE functions?
The differences between the PostgreSQL REGEXP_MATCH and REGEXP_LIKE functions are in their purpose and syntax. REGEXP_MATCH returns a text array of matching substrings within the first match of a regular expression pattern, allowing extraction of specific substrings. On the other hand, REGEXP_LIKE is a boolean function that checks if a string matches a specified regular expression pattern, returning true or false.
Should you use the REGEXP_SUBSTR function or the REGEXP_MATCH SQL function in PostgreSQL?
The choice between REGEXP_SUBSTR and REGEXP_MATCH in PostgreSQL depends on the desired outcome. If you need the whole matching substring or NULL for no match, REGEXP_SUBSTR is preferable. However, it is only available from PostgreSQL version 15 onwards. In older versions, you can get the same result by accessing the first element of REGEXP_MATCH's result array.

