SQL

SQL PARTITION BY in PostgreSQL: A Guide to Window Functions and Data Segmentation

intro

Time to look at what the SQL PARTITION BY clause is in PostgreSQL and how it works in dividing result sets into segments — let’s get right into it.

Tools used in the tutorial
Tool Description Link
Dbvisualizer DBVISUALIZER
TOP RATED DATABASE MANAGEMENT TOOL AND SQL CLIENT
PostgreSQL logo POSTGRESQL
THE POSTGRESQL DATABASE

The SQL PARTITION BY sub-clause in PostgreSQL determines how records in a database are grouped into partitions for window functions. Each record is assigned to a partition based on the specified column(s). To fully understand this concept, it is important to know window functions and a clear overview of this concept from the official PostgreSQL documentation.

In this blog, we’re going to look at PARTITION BY in detail, including its use cases using a database management system like PostgreSQL, and examples of window functions for performing segmented calculations across different groups in a dataset.

What Is the PARTITION BY Clause in SQL?

As a sub-clause of the OVER clause, the SQL PARTITION BY is used in almost all invocations of window functions like AVG()MAX(), and RANK(). In SQL, window frames define the set of rows within a partition that are considered for calculation by a window function, based on a specified range or number of preceding and following rows relative to the current row.

The PARTITION BY SQL sub-clause defines how records are grouped into partitions for window functions, allowing calculations to be performed across a set of rows. Typically used within the OVER() clause, it segments the data into distinct groups, with the window function operating independently on each group.

How to Use the SQL PARTITION BY Clause in PostgreSQL

Here is the SQL PARTITION BY syntax in PostgreSQL:

Copy
        
1 SELECT column1, column2, 2 window_function() OVER ( 3 PARTITION BY partition_column 4 ORDER BY sort_column 5 ) 6 FROM demo_table;

In the above syntax:

  • window_function() represents any SQL window function, such as ROW_NUMBER(), RANK(), SUM(), etc.
  • PARTITION BY specifies the column(s) that define the partitioning. Each unique value in the specified column(s) creates a separate partition.
  • ORDER BY is optional and determines the order of rows within each partition.
  • demo_table is the table which we’re querying data from.

Each partition created by the PARTITION BY clause acts like a smaller, independent subset of the data, where the window_function() operates only within the rows that belong to that partition, as defined by the unique values of the specified column(s).

PARTITION BY in SQL: Use Cases

When PARTITION BY in SQL usually comes into play:

  1. Ranking within groups: Calculating the rank or position of each item within a group, such as the top-performing employee by department.
  2. Cumulative calculations: Running totals, averages, or other aggregations within specific partitions (e.g., cumulative sales by region).
  3. Segregated calculations: Computing min, max, or other stats within subsets of data, like the maximum salary in each department.

Take a look at some PARTITION BY examples in PostgreSQL queries.

Note: In these examples, we are going to execute queries in DbVisualizer — the world’s leading fully-featured database client that supports multiple DBMS technologies. Although any other database client will do.

Example #1 - Cumulative Sum

Consider this Sales table that tracks sales data across regions:

The Sales table in DbVisualizer
The ‘Sales’ table in DbVisualizer

Our goal is to calculate the running total of sales within each region. This is how you can achieve it:

Copy
        
1 SELECT region, 2 sale_date, 3 sales_amount, 4 SUM(sales_amount) OVER (PARTITION BY region ORDER BY sale_date) AS cumulative_sales 5 FROM Sales;

In this example, SUM(sales_amount) calculates the running total of sales_amount within each region partition, ordered by sale_date.

Below is the query output in DbVisualizer:

Screenshot 2024-10-28 at 7.49.47 PM.png
Query output

Great! Now, take a look at what happened:

  • Arlington’s Cumulative Total: For Arlington, the cumulative_sales increases with each row: 150.4, 318.8, then 1580.4, based on the sale_date order. The running total resets for the next region (San Diego).
  • San Diego’s Cumulative Total: Similarly, San Diego’s cumulative total starts fresh at 177.2 on the first date, then adds up to 328.6 , 579.0, and then to 735.4 on the last date.

The cumulative sum is calculated separately for each region, ensuring each region has its own running total that doesn’t interfere with other regions.

Example #2 - Window Functions

Explore the usage of window functions (ROW_NUMBER(), MAX(), MIN(), SUM(), AVG()) with PARTITION BY.

Note: We’re going to employ the Sales table we used earlier in the previous example alongside this employees table:

The employees table in DbVisualizer
The employees table in DbVisualizer

ROW_NUMBER(): Row Number Within Partitions

When combined with the PARTITION BY clause, ROW_NUMBER() assigns a unique row to each number within a partition, as in the example below:

Copy
        
1 SELECT employees_id, 2 department, 3 salary, 4 ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank 5 FROM employees;

Upon the application of this query, the employees in our employees table are going to be ranked by salary (in descending order) within each department as shown in DbVisualizer here:

Employees ranked by salary
Employees ranked by salary

Employees ranked by salary

In the query result shown in the image above:

  • Finance department tops the table with a dept_rank of 1 with a single employee. The same can be said for the HR department.
  • IT department on the other hand has three employees with the salary starting from 5000.00, to 4120.00 and then 3800.00. This accounts for the dept_ranks of 1, 2 and 3 for the three employees respectively.
  • The same logic applies to other departments.

SUM() and AVG(): Calculating Running Totals and Averages

With PARTITION BY, we can calculate cumulative sales from our Sales table and the averages within each region. Check this out:

Copy
        
1 SELECT region, 2 sale_date, 3 sales_amount, 4 SUM(sales_amount) OVER (PARTITION BY region ORDER BY sale_date) AS cumulative_sales, 5 AVG(sales_amount) OVER (PARTITION BY region) AS average_sales 6 FROM Sales;

The above query would produce the following output:

Executing the query in DbVisualizer
Executing the query in DbVisualizer

This query calculates the cumulative sales in each region and the average sales amount for each region.

MAX() and MIN(): Maximum and Minimum Values in Partitions

Assuming you’re a data analyst, and you have been tasked at work to identify the highest and lowest-ranking salaries within the departments in the company. Here’s how to achieve that:

Copy
        
1 SELECT department, 2 salary, 3 employees_id, 4 MAX(salary) OVER (PARTITION BY department) AS max_salary_in_dept, 5 MIN(salary) OVER (PARTITION BY department) AS min_salary_in_dept 6 FROM employees;

The result of the query will be:

Partitioning query results based on the highest and lowest-ranking salaries.
Partitioning query results based on the highest and lowest-ranking salaries.

Bingo! It's that simple!

Keep in mind that the approaches we saw here are only some of the many ways of utilizing the PARTITION BY SQL clause. There are other advanced methods such as partitioning with multiple columns, combining partitioning with filtering, etc.

Best Practices To Consider When Using PARTITION BY in SQL

Consider these practices worthwhile when dealing with SQL PARTITION BY:

  • Specify ORDER BY in window functions when the order of the results matters. However, keep in mind that ORDER BY may slow down queries, so only use it if order matters for calculation.
  • Monitor your query performance with execution plans. Read our article on SQL EXPLAIN.
  • Always try to use appropriate frame clauses to limit window size.

Conclusion

In this blog, we've walked you through one of the most important clauses when working with result sets within your PostgreSQL infrastructure – the PARTITION BY SQL clause. You've learned what it is, how it works internally with other window functions such as SUM(), AVG(), ROW_NUMBER(), etc. in PostgreSQL, and how it can help you achieve your goals within the database space. All demonstrations were done in DbVisualizer—the database client with the highest user satisfaction in the market.

Before closing, follow our blog for more updates, and we’ll see you in the next one.

FAQ

What is the PARTITION BY clause in SQL?

The PARTITION BY clause in SQL is used to divide a dataset into subsets, or "partitions," based on unique values in one or more columns. It’s commonly used with window functions (like SUM, AVG, RANK, etc.) to perform calculations within each subset independently.

When Should I Use the SQL PARTITION BY Clause?

You should use PARTITION BY when you need calculations like running totals, ranks, or averages to reset based on certain groups within the data, such as regions, departments, or product categories.

What are some examples of window functions that work with PARTITION BY?

Window functions that often use PARTITION BY include:

  • SUM() for cumulative totals within each partition.
  • AVG() for calculating running averages.
  • RANK() or ROW_NUMBER() for assigning ranks or row numbers within each partition.
  • MIN() and MAX() for calculating rolling minimums and maximums.
  • COUNT() for cumulative row counts.

Each of these functions can be used with PARTITION BY to calculate results independently within each subset of the data.

Where can I grab a free trial for DbVisualizer?

For first-time users, DbVisualizer is free for 21 days only. Grab our free trial here.

Dbvis download link img
About the author
Leslie S. Gyamfi.
Leslie S. Gyamfi
Leslie Gyamfi is a mobile/web app developer with a passion for creating innovative solutions. He is dedicated to delivering high-quality products and technical articles. You can connect with him on LinkedIn
The Table Icon
Sign up to receive The Table's roundup
More from the table
Title Author Tags Length Published
title

SQL Derived Table: Everything You Need to Know

author Antonello Zanini tags MySQL POSTGRESQL SQL SQL SERVER 9 min 2025-02-06
title

Understanding MVCC in MySQL

author Lukas Vileikis tags DbVisualizer MySQL SQL 7 min 2025-02-05
title

Postgres Create Array of Numbers: Complete Guide

author Lukas Vileikis tags POSTGRESQL SQL 7 min 2025-02-03
title

SQL Injection Cheat Sheet: SQLi 101

author Lukas Vileikis tags MySQL SECURITY SQL 10 min 2025-01-29
title

A Complete Guide to the ALTER TABLE DROP COLUMN Statement

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 8 min 2025-01-27
title

How to Use a Recursive CTE in SQL

author Antonello Zanini tags MySQL POSTGRESQL SQL SQL SERVER 5 min 2025-01-22
title

Guidelines for a Proper SQL Script: ACID, CRUD, and Other Things You Need to Know

author Lukas Vileikis tags DbVisualizer MySQL OPTIMIZATION SQL 6 min 2025-01-21
title

How to Use MERGE in SQL Query Statements: Complete Guide

author Antonello Zanini tags ORACLE POSTGRESQL SQL SQL SERVER 8 min 2025-01-20
title

Automating SQL Queries with SQL CLI and SQL Job Scheduling

author Bonnie tags AUTOMATION SQL 6 min 2025-01-14
title

Understanding the SQL UNIQUE Constraint

author Leslie S. Gyamfi tags DbVisualizer SQL 6 min 2025-01-07

The content provided on dbvis.com/thetable, including but not limited to code and examples, is intended for educational and informational purposes only. We do not make any warranties or representations of any kind. Read more here.