intro
Want to return a substring of a string? Look no further! The SUBSTRING_INDEX function is just what you’re looking for — read this blog and find out why.
Many DBAs dealing with all kinds of database management systems find themselves in situations where they have to obtain a substring of a string—say, everything preceding or after a certain character, etc. Here’s where the SUBSTRING_INDEX
in SQL comes in—it’s a very powerful, but often overlooked function providing us with the ability to return a substring after a certain index, hence the name.
In this article, you will learn what that SQL function is, how it works, and when to use it!
How Does SUBSTRING_INDEX Work?
The [SUBSTRING_INDEX
in SQL](https://www.dbvis.com/thetable/a-complete-guide-to-substring_index-in-sql/) function is self-explanatory. It returns a substring of a string based on a given index. In other words, the SQL SUBSTRING_INDEX
function returns a part of a string after a certain character.
Take this example into account:
1
SELECT SUBSTRING_INDEX("https://dbvis.com", ".", 2);
The result will be:
So, the syntax of SUBSTRING_INDEX
is:
1
SUBSTRING_INDEX(**string**, **delimiter**, **count**);
The three parameters are:
Here’s another example:
1
SELECT SUBSTRING_INDEX("www.dbvis.com", ".", -2);
This time, the query will produce this result:
This example returns a string with a negative number of occurrences of a delimiter. In other words, in the string “www.dbvis.com”, “-1” would return “com” because “com” is after the first dot from the right-hand side. And indeed, “-2” returns the “dbvis.com” string.
It’s also worth noting that if any of the given arguments is NULL
, SUBSTRING_INDEX
in SQL will also return NULL
.
Use Cases of SUBSTRING_INDEX
Now that you understand how SUBSTRING_INDEX
works, it’s time to get your hands on some real-world use cases.
Unfortunately, the examples shown above won’t apply to 90% of the use cases in the real world because most real-world use cases will deal with one or more of the following:
Not to mention, operations would have to be performed at the same time when other queries are running without disrupting performance.
Example #1
Here’s how SUBSTRING_INDEX
would look like in the real world:
1
SELECT
2
ip,
3
SUBSTRING_INDEX(ip, ".", 1) AS ip_p1,
4
SUBSTRING_INDEX(SUBSTRING_INDEX(ip, ".", 2), ".", -1) AS ip_p2,
5
SUBSTRING_INDEX(SUBSTRING_INDEX(ip, ".", -2), ".", 1) AS ip_p3,
6
SUBSTRING_INDEX(ip, ".", -1) AS ip_p4
7
FROM logs;
In this case, we’re obtaining IP addresses in parts. In other words, such an SQL query would make the IP address “127.0.0.1” to be returned in 4 parts named “ip_p1” (IP_part1), “ip_p2”, “ip_p3”, and “ip_p4” with the results being displayed as follows:
IP Part | Number |
---|---|
ip_p1 | 127 |
ip_p2 | 0 |
ip_p3 | 0 |
ip_p4 | 1 |
Such a SUBSTRING_INDEX in SQL approach may be immensely valuable for companies with geolocation use cases, criminal data breach investigations, etc.
Example #2
1
SELECT SUBSTRING_INDEX(column, "character", -1) FROM table;
The second use case would be very useful for companies having a lot of users and performing analytics on user data — that may include registration domains (example above), locations, or other data. This query would also help companies analyze data breaches as it can be combined with a COUNT(*)
query to count how many users have been registered with a specific domain, etc.
The only downside here is that COUNT(*)
in InnoDB would likely be very slow to execute as InnoDB does not contain table metadata inside of itself as its brother MyISAM does.
Things to Keep in Mind
As you can see from the real-world examples above, we’re no longer using the SUBSTRING_INDEX
in SQL function with strings — we’re using column values instead. Doing this provides us with an upside in we can analyze data in the entire column at once.
When doing that, it’s wise to keep in mind that the bigger your table is, the slower SUBSTRING_INDEX
will take to execute. For some, a reminder of indexes would help — in this case, an index would help speed up SUBSTRING_INDEX in SQL dramatically due to the fact that the function is used together with a SELECT
query.
The same works the other way around — if we want to export our result set into a file (e.g. after analyzing a data set we may use the data for other purposes too), we may combine the SUBSTRING_INDEX
SQL function with INTO OUTFILE
so our SQL function now looks like this:
1
SELECT SUBSTRING_INDEX(column, "@", "-1") **INTO OUTFILE "/tmp/data.txt";**
Here’s a good performance-related example:
1
SELECT SUBSTRING_INDEX(email, "@", -1) AS email, COUNT(*) AS amount FROM users GROUP BY email;
What do you think this SUBSTRING_INDEX
query will do? Right — it will select the amount of times an email domain has been provided in the column and provide us with the result.
SUBSTRING_INDEX in SQL’s Brother
SUBSTRING_INDEX
in SQL also has a brother, whose name is SUBSTRING
. The functionality of SUBSTRING
in SQL is quite self-explanatory: it provides us with the ability to return a substring of a string, but without the ability to specify an index. In other words, a SUBSTRING
function in SQL would look like this:
1
SELECT SUBSTRING("This is an example", 11);
Such a substring query would return “example” (everything after the eleventh letter.)
One can also use SUBSTRING
like this:
1
SELECT SUBSTRING("Value" FROM position);
So, if we execute a query like:
1
SELECT SUBSTRING("This is an example" FROM 11);
We would also get “example.”
RIGHT()
Some database management systems including MySQL Server and SQL Server, also provide a function called RIGHT()
that works similarly. For example, a SQL query like so:
1
SELECT RIGHT("This is an example", 7);
Would return “example” too:
In this case, though, letters are counted backward (i.e., e,l,p,m,a,x,e
instead of e,x,a,m,p,l,e
—that’s why we use “7” instead of “11.”
Database Management Systems and Their Support for Functions
Not all database management systems support SUBSTRING_INDEX
and other aforementioned functions. Find the table denoting the support for functions like SUBSTRING_INDEX
in SQL below:
Database Management System | Support |
---|---|
Oracle | SUBSTR |
SQL Server | SUBSTR, SUBSTRING[_INDEX], RIGHT |
MySQL Server | SUBSTR, SUBSTRING[_INDEX], RIGHT |
PostgreSQL | SUBSTRING with pattern matching |
SQLite | SUBSTR |
The table above will help you understand what function to use in which circumstances — as you can see, SUBSTRING_INDEX
in SQL is only supported in SQL Server and MySQL, not in other database management systems. Oracle or SQLite doesn’t have the luxury of such a function, as doesn’t PostgreSQL.
With that being said, PostgreSQL does provide support for a lot of data types and, in this case, support for a SUBSTRING
function with pattern matching meaning that it can be employed similarly as SUBSTRING_INDEX
if used properly.
In PostgreSQL, the SUBSTRING
function takes two parameters - a string (column) and a pattern that returns NULL
upon no matches. A match is returned once, or if, it's found. Everything works like so:
The usage of SUBSTRING
in PostgreSQL would look like so:
Query | Result |
---|---|
SELECT SUBSTRING("foobar" FROM "o.b"); | o[anythingbecauseofthedot]b |
SELECT fname, SUBSTRING(fname,1,1) AS first_letter FROM users; | Returns the first letter of the first name of the user. e.g. “J” for “Jack”, “D” for “David”, “E” for “Emily”, etc. |
SELECT SUBSTRING("The emergency-call number to dial in case of a necessity is 1887", "([0-9]{1,4})") AS emergency_call; | Any provided number with digits 0-9 having from 1 to 4 digits. In this case, 1887. |
How Will You Use SUBSTRING_INDEX?
As you can see, SUBSTRING_INDEX
is a really powerful beast — everything not only depends on how your queries are structured, but your database management system also plays a role.
We hope that this blog helped you understand how to use SUBSTRING_INDEX in SQL and that you saw its power from first-hand examples.
Don’t forget that the power of your database management system lies not only within the functions available for you to use but also in the way you use them — in other words, your SQL client of choice.
We recommend you use DbVisualizer because it’s the #1 choice for many companies across the globe including Tesla, NASA, Volkswagen, and many others. Click here to grab your free trial and try the power of DbVisualizer and unleash the support for many database management systems, the power of the SQL client and advanced query optimization abilities, ERD-like schema generation, and more for yourself.
We’ll see you in the next blog! Bye for now.
FAQ
What is SUBSTRING_INDEX in SQL?
The SUBSTRING_INDEX
function helps us to return a part of data as necessary. In other words, it allows us to return a substring of a string after a specified index, hence the name.
When should I use SUBSTRING_INDEX in an SQL query?
SUBSTRING_INDEX
should be used whenever you need to return anything after a certain string.
What DBMS supports SUBSTRING_INDEX?
Database management systems supporting SUBSTRING_INDEX
include SQL Server and MySQL Server. Other database management systems have similar capabilities.