Many database administrators use a whole host of SQL functions and clauses available to them - these include, but are not limited to functions that count things, functions that remove spaces from a given string, functions that format a value into a specified format, etc.
With that being said, the NOT IN
clause is a little different - this SQL operator is used to select values that are not included in something and it is frequently combined with other clauses, such as AND
, WHERE
, and even SQL functions like IF
and the like.
When and How to Use NOT IN?
NOT IN
is one of those SQL functions that has many overlooked use cases. Here are two of the most popular use cases of NOT IN
:
These use cases may seem scarce, but when in the hands of an experienced DBA, NOT IN
becomes a powerful way to query our data with more complexity — you will understand why this clause can be used to solve so many issues once we dig into examples.
One of the popular use cases concerning NOT IN
would look like so:
1
SELECT * FROM company.invoices
2
WHERE issued_by NOT IN ('Jack', 'Josh', 'Matthew');
Another example can be found below:
1
SELECT username FROM demo_table
2
WHERE user_id NOT IN (SELECT id FROM demo_table2);
The code block below also depicts a popular use case:
1
SELECT * FROM purchases.suppliers
2
WHERE supplier NOT IN (
3
SELECT supplier_id
4
FROM old_purchases.suppliers
5
INNER JOIN orders
6
ON customers.customer_id = orders.customer_id
7
);
The list can go on and on — everything works the other way around by getting rid of the NOT
clause and using IN
instead too.
What Makes the SQL NOT IN Clause So Powerful?
Look at the examples above — ranging from selecting people who issued invoices and users not having IDs that match IDs from another table to using NOT IN
together with INNER JOIN
operations, the clause is indeed immensely powerful — as you can see, the devil is in the details and we can make this SQL clause perform however we desire.
The primary reason why this SQL clause is so powerful is because it can be combined with various types of other data modification operations at once. However, that also means another thing — how well the clause will perform is directly dependent on the options we’ve provided to our database management system. In other words, what makes this clause more or less powerful is our database configuration.
Database Configuration for SQL Clauses
As you now see, a simple [NOT] IN
operation can be used to perform a number of different things, but all operations should be used with care. In other words, there are a couple of performance considerations that we should keep in mind when using such a clause:
The answers to each of these questions will directly influence the path we should take when dealing with our data:
The reason we’re bringing this up is because the more complex your data becomes, the more difficult for your database it may be to keep up with the usage of this clause.
Using NOT IN
or its brother IN
does have an impact on performance, and the bigger your data set is, the wider the impact will be felt. Unfortunately, the solution to this problem has nothing to do with performance optimization or the usage of SQL clients — the problem is within the clause itself: IN
does not work well with bigger data sets because it’s built to work with small-to-mid data sets.
Alternatives to IN and Related SQL Clauses
The clauses IN
and NOT IN
perform well, but only when one statement is true - our data set is not bigger than a couple million rows.
If we have a couple million rows and above, we will need to think of different solutions — you will find a number of SQL clauses that can be used to replace the NOT IN
clause:
NOT IN Equivalent | Explanation |
---|---|
NOT EXISTS | Quite a self-explanatory clause. NOT EXISTS can be used to speed up query performance if we have subqueries too. |
Joins | Many DBAs also elect to use the left outer join to replace NOT IN - joins usually work better with data separated between multiple databases or tables. |
EXCEPT | Don’t be confused — EXCEPT does not provide us with literally the same behavior as NOT EXISTS does, but it does perform a similar task — it returns rows that derive from one or more expressions specified in a query. |
Linux Functions (e.g. sort and the like) | One can also use sort with the -u (unique) flag. The reason sort works so well with big data sets is because it’s designed to bulk-process data in text files; if we export all of our data in a manner suitable for bigger data sets (i.e. by using SELECT … INTO OUTFILE or similar functions) and use the sort function to sort items or remove duplicates, it should be way faster than using NOT IN and many other SQL queries as well. |
How we work with data is also immensely important — if our database is optimized well (check one of our previous articles for advice on how to do that), we should not have that many IN
or NOT IN
clauses to begin with. Even if we do, our SQL client of choice is usually capable of assisting us by letting us observe our data as if it was an excel sheet, providing us with ERD diagrams, and assisting us in different ways where necessary. DbVisualizer is one of such SQL clients — it’s the highest-rated SQL client on the market and it can assist us when working with a very wide variety of database management systems, too: no matter if we use MSSQL, Oracle, PostgreSQL, or ClickHouse, DbVisualizer can deal with them all. And with the biggest names in the industry using the tool, you really have nothing to lose. Especially when the first 21 days of use are on us!
Frequently Asked Questions
What is the NOT IN Clause?
The SQL IN and NOT IN clauses are one of the primary ways to tell your database management system what data should or should not be included in a result set — but it does come with a downside in that it cannot feasibly deal with bigger data sets. For that, you have to use other solutions.
When Can the SQL IN Clause Help Me?
This SQL clause will help you when you’re dealing with data and want to exclude or include certain rows from a result set.
When to use NOT IN? When to use other solutions?
NOT IN
should be used if you have less than a couple million rows. In other cases, switch to NOT EXISTS
, JOIN
clauses or even the sort
clause in Linux if necessary.
Why should I use a SQL Client?
You should use a SQL client like the one provided by DbVisualizer because a good SQL client will not only let you visualize your data simplifying your developer life, but also provide you with a whole host of other tools that will let you simplify the work with your data, no matter what database management system you may find yourself using.