MySQL
SQL

SQL NOT IN: the Good, Bad & the Ugly

intro

The SQL NOT IN clause is used frequently, yet many DBAs don’t know how it works internally - read this blog and find out!

Tools used in the tutorial
Tool Description Link
Dbvisualizer DBVISUALIZER
TOP RATED DATABASE MANAGEMENT TOOL AND SQL CLIENT
MySQL logo MySQL
THE MYSQL DATABASE

Many database administrators use a whole host of SQL functions and clauses available to them - these include, but are not limited to functions that count things, functions that remove spaces from a given string, functions that format a value into a specified format, etc.

With that being said, the NOT IN clause is a little different - this SQL operator is used to select values that are not included in something and it is frequently combined with other clauses, such as AND, WHERE, and even SQL functions like IF and the like.

When and How to Use NOT IN?

NOT IN is one of those SQL functions that has many overlooked use cases. Here are two of the most popular use cases of NOT IN:

  • Comparing a string to a list of strings (the same goes for integers);
  • Filtering out strings that are NOT to be modified, deleted, or otherwise interacted with.

These use cases may seem scarce, but when in the hands of an experienced DBA, NOT IN becomes a powerful way to query our data with more complexity — you will understand why this clause can be used to solve so many issues once we dig into examples.

One of the popular use cases concerning NOT IN would look like so:

Copy
        
1 SELECT * FROM company.invoices 2 WHERE issued_by NOT IN ('Jack', 'Josh', 'Matthew');

Another example can be found below:

Copy
        
1 SELECT username FROM demo_table 2 WHERE user_id NOT IN (SELECT id FROM demo_table2);

The code block below also depicts a popular use case:

Copy
        
1 SELECT * FROM purchases.suppliers 2 WHERE supplier NOT IN ( 3 SELECT supplier_id 4 FROM old_purchases.suppliers 5 INNER JOIN orders 6 ON customers.customer_id = orders.customer_id 7 );

The list can go on and on — everything works the other way around by getting rid of the NOT clause and using IN instead too.

What Makes the SQL NOT IN Clause So Powerful?

Look at the examples above — ranging from selecting people who issued invoices and users not having IDs that match IDs from another table to using NOT IN together with INNER JOIN operations, the clause is indeed immensely powerful — as you can see, the devil is in the details and we can make this SQL clause perform however we desire.

The primary reason why this SQL clause is so powerful is because it can be combined with various types of other data modification operations at once. However, that also means another thing — how well the clause will perform is directly dependent on the options we’ve provided to our database management system. In other words, what makes this clause more or less powerful is our database configuration.

Database Configuration for SQL Clauses

As you now see, a simple [NOT] IN operation can be used to perform a number of different things, but all operations should be used with care. In other words, there are a couple of performance considerations that we should keep in mind when using such a clause:

  1. How many records does our database have?
  2. What kind of data are we dealing with?
  3. Are we using any other functions? Are we aware of their impact to our database?
  4. What database management system is in use?
  5. Is our database optimized for reading or writing? Perhaps we optimized the database for both types of operations?

The answers to each of these questions will directly influence the path we should take when dealing with our data:

  1. The amount of records in our database will determine how fast our query is;
  2. The data types we’re dealing with will provide us with options how to modify them (i.e. string operations are a little different than numerical-based ones)
  3. Other options we may be using will impact the performance of our clause.
  4. The database management system in use may provide us more or less options that we can use in combination with or in place of NOT IN.
  5. The way we’ve optimized our database will give us insight into how best to deal with issues pertaining to this and other SQL clauses.

The reason we’re bringing this up is because the more complex your data becomes, the more difficult for your database it may be to keep up with the usage of this clause.

Using NOT IN or its brother IN does have an impact on performance, and the bigger your data set is, the wider the impact will be felt. Unfortunately, the solution to this problem has nothing to do with performance optimization or the usage of SQL clients — the problem is within the clause itself: IN does not work well with bigger data sets because it’s built to work with small-to-mid data sets.

Alternatives to IN and Related SQL Clauses

The clauses IN and NOT IN perform well, but only when one statement is true - our data set is not bigger than a couple million rows. If we have a couple million rows and above, we will need to think of different solutions — you will find a number of SQL clauses that can be used to replace the NOT IN clause:

NOT IN EquivalentExplanation
NOT EXISTSQuite a self-explanatory clause. NOT EXISTS can be used to speed up query performance if we have subqueries too.
JoinsMany DBAs also elect to use the left outer join to replace NOT IN - joins usually work better with data separated between multiple databases or tables.
EXCEPTDon’t be confused — EXCEPT does not provide us with literally the same behavior as NOT EXISTS does, but it does perform a similar task — it returns rows that derive from one or more expressions specified in a query.
Linux Functions
(e.g. sort and the like)
One can also use sort with the -u (unique) flag. The reason sort works so well with big data sets is because it’s designed to bulk-process data in text files; if we export all of our data in a manner suitable for bigger data sets (i.e. by using SELECT … INTO OUTFILE or similar functions) and use the sort function to sort items or remove duplicates, it should be way faster than using NOT IN and many other SQL queries as well.

How we work with data is also immensely important — if our database is optimized well (check one of our previous articles for advice on how to do that), we should not have that many IN or NOT IN clauses to begin with. Even if we do, our SQL client of choice is usually capable of assisting us by letting us observe our data as if it was an excel sheet, providing us with ERD diagrams, and assisting us in different ways where necessary. DbVisualizer is one of such SQL clients — it’s the highest-rated SQL client on the market and it can assist us when working with a very wide variety of database management systems, too: no matter if we use MSSQL, Oracle, PostgreSQL, or ClickHouse, DbVisualizer can deal with them all. And with the biggest names in the industry using the tool, you really have nothing to lose. Especially when the first 21 days of use are on us!

Frequently Asked Questions

What is the NOT IN Clause?

The SQL IN and NOT IN clauses are one of the primary ways to tell your database management system what data should or should not be included in a result set — but it does come with a downside in that it cannot feasibly deal with bigger data sets. For that, you have to use other solutions.

When Can the SQL IN Clause Help Me?

This SQL clause will help you when you’re dealing with data and want to exclude or include certain rows from a result set.

When to use NOT IN? When to use other solutions?

NOT IN should be used if you have less than a couple million rows. In other cases, switch to NOT EXISTS, JOIN clauses or even the sort clause in Linux if necessary.

Why should I use a SQL Client?

You should use a SQL client like the one provided by DbVisualizer because a good SQL client will not only let you visualize your data simplifying your developer life, but also provide you with a whole host of other tools that will let you simplify the work with your data, no matter what database management system you may find yourself using.

Dbvis download link img
About the author
LukasVileikisPhoto
Lukas Vileikis
Lukas Vileikis is an ethical hacker and a frequent conference speaker. He runs one of the biggest & fastest data breach search engines in the world - BreachDirectory.com, frequently speaks at conferences and blogs in multiple places including his blog over at lukasvileikis.com.
The Table Icon
Sign up to receive The Table's roundup
More from the table
Title Author Tags Length Published
title

SQL ORDER BY Clause: Definitive Guide

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 7 min 2024-04-22
title

SQL EXCEPT: Everything You Need to Know

author Leslie S. Gyamfi tags SQL 8 min 2024-04-18
title

Adding Dates in SQL: A Complete Tutorial

author Antonello Zanini tags DATE DATETIME MySQL POSTGRESQL SQL SERVER 7 min 2024-04-15
title

Glossary of the SQL Commands You Need to Know

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 12 min 2024-04-11
title

SUBSTRING_INDEX in SQL Explained: A Guide

author Lukas Vileikis tags MySQL SQL 8 min 2024-04-08
title

SQL Add to Date Operations: A Complete Guide

author Antonello Zanini tags DATE DATETIME MySQL ORACLE POSTGRESQL SQL SQL SERVER 6 min 2024-04-01
title

SQL CAST Function: Everything You Need to Know

author Antonello Zanini tags CAST MySQL ORACLE POSTGRESQL SQL SQL SERVER 7 min 2024-03-28
title

INSERT INTO SQL Clause

author Leslie S. Gyamfi tags INSERT SQL 6 min 2024-03-18
title

ALTER TABLE ADD COLUMN in SQL: A Comprehensive Guide

author TheTable tags SQL 5 min 2024-03-12
title

MySQL IFNULL – Everything You Need to Know

author Leslie S. Gyamfi tags IFNULL MySQL 6 min 2024-02-26

The content provided on dbvis.com/thetable, including but not limited to code and examples, is intended for educational and informational purposes only. We do not make any warranties or representations of any kind. Read more here.

Cookie policy

We use cookies to ensure that we give you the best experience on our website. However you can change your cookie settings at any time in your browser settings. Please find our cookie policy here ↗