intro
In this article, you will find out what SQL query optimization is and why it is so important. You will also see some tips for building fast SQL queries and learn what SQL has to offer when it comes to analyzing a query for optimization. SQL query optimization is critical to writing more efficient SQL queries. This ensures that SQL queries are executed quickly, making applications that rely on that data faster. In this article, you will dig into the concept of SQL query optimization, understand its importance, and how to apply it. You will also learn some best practices for writing efficient SQL queries. It’s time to improve your SQL query writing skills!
What is SQL Query Optimization?
SQL query optimization is the process of making SQL queries faster and more efficient. The idea behind query optimization is to identify and address performance bottlenecks. This typically involves changing the query, defining indexes, or modifying the database schema. The ultimate goal of SQL query optimization is to reduce the time and resources required to execute a query. This way, the performance of applications that rely on that query will improve accordingly.
Note that the performance of a query depends on the hardware and load of the database server, but also on how you write the query. At the same time, most popular DBMS technologies come with a query optimizer. This reorders or rewrites queries behind the scene to improve performance. Specifically, the DBMS optimizes the query execution plan devised to run the query. Thus, two different queries can be translated into the same execution plan.
Why Is SQL Query Optimization Important?
There are at least three good reasons to optimize queries to make them more efficient. Let’s see them all.
Improved performance
SQL performance tuning helps to reduce the amount of time and resources required by the database server. Since many applications may depend on the same database server, that means improving the performance of several applications.
Scaling becomes easier
Slow queries become even slower as the volume of data increases. In contrast, SQL query optimization allows you to ensure that queries continue to execute efficiently as the amount of data grows.
Cost reduction
Optimizing SQL queries can help to reduce the CPU and memory usage required to run a database server. This means saving money on server hosting.
5 Tips for SQL Performance Tuning
Let’s now see some tips and common mistakes to avoid for writing efficient SQL queries.
1. Select the exact number of columns
One of the most common mistakes is to use the * operator in SELECT statements, even when you are only interested in retrieving a few columns. Using * makes your queries easier to write, but also less efficient.
Example of query where all columns get selected:
1
SELECT * FROM users
When writing the SELECT
clause, use only the number of columns you need to select. This will speed up your query.
Example of a query where only the columns you need are selected:
1
SELECT id, name, surname FROM users
2. Avoid useless WHERE
conditions
Often, SQL queries are written without thinking too much or having the underlying data structure well in mind. For example, let’s say you want to retrieve all users with a value in the points
field. You will write the following query:
1
SELECT id, name, surname
2
FROM users
3
WHERE points IS NOT NULL
This works like a charm and would return what you expect. However, if you had inspected the users
table before writing the query, you would have noticed that points
is a non-nullable integer with 0
as the default value. In other terms, points
is always NOT NULL
. So, the WHERE
condition you added will only make your query unnecessarily slower.
This is just an example. Yet, before writing WHERE
conditions, you should always check that what you are adding is actually useful for filtering the data.
3. Avoid Negative Searches
When it comes to query optimization, you have to take into account even the small details. For example, using the NOT
operator in a WHERE
clause can make a query slower. This is because the DBMS may have to check all the rows in the table and exclude the ones that do not match the condition, which takes more time and resources than just selecting the rows that match the condition. So, positive queries are typically more efficient than equivalent negative queries.
Example of a negative query:
1
SELECT id, name, surname
2
FROM users
3
WHERE NOT name = "Jhon"
Example of the equivalent query in positive form:
1
SELECT id, name, surname
2
FROM users
3
WHERE name != "Jhon"
Not considering the low-level optimizations made by most DBMS, the second query is faster than the first one.
4. Use temporary tables
You can use temporary tables, also known as temp tables, for SQL query optimization. Specifically, temp tables are useful to store and manipulate intermediate results within a query. This can help improve performance by reducing the amount of data to be processed. Also, temp tables are automatically created when running ALTER TABLE
queries on large tables.
For example, you can use a temp table to store the results of a JOIN operation. Then, use that table in the following queries to filter some data. This can be more efficient than performing the JOIN and filtering the data in a single query. Also, you can add indexes to a temp table to improve performance.
5. Avoid the DISTINCT keyword
In SQL, the DISTINCT
keyword forces the DBMS to return unique values from a query. When used in a SELECT
statement, DISTINCT
removes duplicate rows from the result set. This requires an extra operation, which makes your queries slower. So, if you do not really need to use the DISTINCT
keyword, try to avoid it.
Example of a query with DISTINCT
:
1
SELECT DISTINCT name, surname
2
FROM users
3
WHERE country = 'USA';
This query selects all unique first and last name combinations of users living in the United States.
Let’s rewrite an equivalent query without DISTINCT
:
1
SELECT name, surname
2
FROM users
3
WHERE country = 'USA';
4
GROUP BY name, surname;
Both of these queries will return the same result set. In detail, the first query uses DISTINCT
to avoid duplicates, whereas the second query uses GROUP BY
to accomplish the same goal.
How EXPLAIN works in SQL
In SQL, the EXPLAIN
command shows you the execution plan of a query. In particular, EXPLAIN
provides information about how the query optimizer will execute the query. This includes:
Note that this info changes based on the DBMS specific implementation of EXPLAIN
.
You can run the EXPLAIN
command as follows:
1
EXPLAIN
2
<YOUR_SQL_QUERY>
Replace <YOUR_SQL_QUERY>
with the SQL code of your query, as in the example below:
1
EXPLAIN
2
SELECT "name", "matchId"
3
FROM "Events"
4
WHERE "goldenPeriodId" IS NOT NULL
5
GROUP BY "name", "matchId"
When running an EXPLAIN
query, this returns a set of rows, each of which represents a step in the execution plan.
Keep in mind that EXPLAIN
does not run the query. For this reason, If you want to get some info on the query execution time, you need to use EXPLAIN ANALYZE
.
As you can see, EXPLAIN ANALYZE
also returns info about the time spent planning and the time required to run the query. Note that only PostgreSQL supports EXECUTE ANALYZE
. However, other DBMSs may offer the same feature under different commands.
EXPLAIN
can be especially useful when you are trying to optimize the performance of a query. By examining the execution plan, you can identify any steps that may be causing poor performance, such as full table scans or inefficient JOIN operations. At the same time, analyzing and understanding the information returned by EXPLAIN
is not easy. This is where an advanced SQL client such as DbVisualizer comes into play!
Explain Plan in DbVisualizer
DbVisualizer comes with an Explain Plan feature, which allows you to visually analyze how a query is processed by the database. Specifically, the Explain Plan executes your query and records the execution plan that the database devised to perform it.
By looking at the plan, you can find out if the database is using the right indexes and joining your tables in the most efficient way. This allows you to understand what the DBMS does behind the scene and helps you optimize your queries accordingly. Note that the Explain Plan is available for both single queries and SQL scripts.
To analyze a query with the Explain Plan in DbVisualizer:
Let’s now learn how to use DbVisualizer’s Explain Plan in a complete example.
Step #1: Running the raw query
Let’s assume you want to run the following query in a PostgreSQL database:
1
SELECT * FROM "Events" E
2
WHERE E."matchId" IS NOT NULL
3
ORDER BY "matchId" ASC, "timePosition" ASC
Events
is a table with nearly 1 million records containing event data related to Judo matches.
As you can see, this query does not follow any SQL performance tuning tips presented earlier. Let’s run it in the Explain Plan.
In the “Graph View” section, you can see the execution plan devised by the DBMS as a chart. This helps you visually understand how your query will be executed. While in the “Tree View” part, you can see all the information in a more compact form. In detail, note that the Total Cost
is 223563.23
If you are not familiar with this concept, Total Cost
refers to the overall computational cost required to execute a query. This involves the amount of memory used, the number of CPU cycles required, and/or the number of disk I/O operations performed. The lower this value is, the more efficient a query is.
Step #2: Applying the optimization tips
First, let’s change the query to select only the required columns:
1
SELECT E."id", E."matchId", E."timePosition", E."name"
2
FROM "Events" E
3
WHERE E."matchId" IS NOT NULL
4
ORDER BY "matchId" ASC, "timePosition" ASC
This significantly reduces Total Cost
to 136423.13
.
Also, matchId
is defined as a non-nullable field. So, let’s remove the WHERE
condition:
1
SELECT E."id", E."matchId", E."timePosition", E."name"
2
FROM "Events" E
3
ORDER BY "matchId" ASC, "timePosition" ASC
Total Cost
falls to 136435.12
.
Step #3: Adding an index
By analyzing the result provided by the Explain Plan, we can notice that no index is used. So, let’s create an index with the following query:
1
CREATE INDEX idx_events_complete
2
ON "Events"("matchId" ASC, "timePosition" ASC);
Now, it is time to execute the query again in the Explain Plan:
1
SELECT E."id", E."matchId", E."timePosition", E."name"
2
FROM "Events" E
3
ORDER BY "matchId" ASC, "timePosition" ASC
As you can see, the execution plan now involves only a single step using the index defined above. Total Cost
is now 66522.0
and has more than halved from the previous step.
By applying the query optimization tips presented above and using the Explain Plan feature of DbVisualizer, we have been able to take a query from a Total Cost
of 223563.23
to 66522.0
. This is a threefold improvement!
Et voilà! SQL query optimization has never been easier!
Conclusion
As you learned here, SQL query optimization is a crucial aspect to make queries faster. By understanding the principles of query optimization and applying best practices, you can write efficient SQL queries. In detail, here you saw five useful tips for SQL performance tuning. Also, you understood what the EXPLAIN
command is and how it can help you analyze the execution plan devised by the DBMS to run a query.
The EXPLAIN
command is powerful but sometimes difficult to use and understand. Fortunately, DbVisualizer comes with the Explain Plan feature! This allows you to visually explore the steps taken by the DBMS to execute the query, the indexes used, and where to intervene to improve the performance. This is just one of the many features offered by DbViualizer that can help you make a query more efficient. Try DbVisualizer for free today!