JOIN

Understanding Self Joins in SQL

intro

In a database, tables can be related to each other through different types of relationships, such as One-to-One Relationships, One-to-Many Relationships, and Many-to-Many Relationships. Based on these relationships, SQL joins are used to extract meaningful information from the data in the tables. However, retrieving meaningful insights from data in the same table with a parent-child relationship can be challenging. In this case, a Self Join is used to establish the parent-child relationships between different rows in the same table and extract meaningful insights.

Tools used in the tutorial
Tool Description Link
Dbvisualizer DBVISUALIZER
TOP RATED DATABASE MANAGEMENT TOOL AND SQL CLIENT

What Is a Self Join in SQL?

A Self Join is a type of a JOIN query used to compare rows within the same table. Unlike other SQL JOIN queries that join two or more tables, a self join joins a table to itself. To use a self join, a table must have a unique identifier column, a parent column, and a child column.

For example, a table can have a primary key column, all employees in a company column, and managers that each employee in the company reports to column. Since all managers in the company are also employees, the managers' column in the table holds the primary key that represents each manager as an employee.

In this case, a Self Join can be used to get the names of all employees and the names of managers they report to in the company. Then the results can be presented in a table with a primary key, employee_name, and manager_name columns.

Self Join Syntax and Table Aliases

A self join syntax often looks like so:

Copy
        
1 SELECT table1.column1, table2.column2 2 FROM table_name table1 3 JOIN table_name table2 ON table1.column = table2.column;

In the syntax above, table_name is the name of the table that self join is used to join to itself. Table1 and table2 are table aliases used to represent rows being compared.

Column1, and column2 are used to represent rows being compared to each other in the resulting table. The table1.column and table2.column are columns used to establish the relationship between two rows in a table.

We’ll begin by demonstrating how to use a self-join query on a PostgreSQL database instance.

Using a Self Join on Relationships In a Table

To illustrate how a self join can be used on relationships in a table, consider a table called Employees.

Employees Table in DbVisualizer.
Employees Table in DbVisualizer.

In the Employees table, the manager_id column represents the relationship between each employee and their manager. Based on the Employee-Manager relationship in the Employees table, we can use a self join to get each employee's name and the name of their respective manager using the query below.

Copy
        
1 SELECT emp.employee_name AS employee, mng.employee_name AS manager 2 FROM Employees emp 3 JOIN Employees mng ON emp.manager_id = mng.employee_id

In the query above, emp and mng represent table aliases where emp is for the employee while mng is for the manager. Then the table is joined on the condition that the manager’s ID (mng.employee_id) matches the employee’s manager ID (emp.manager_id).

When you run the query, it will result in a table that includes an employee's name and their respective manager's name.

The Result of a Self-Join Query in DbVisualizer.
The Result of a Self-Join Query in DbVisualizer

From the resulting table above, we can see the relationship or hierarchy between employees and their managers. Jane Smith reports to John Doe, while Isabel Archer reports to Jane Smith.

Recursive Self Join

A recursive self join is an extension of a self join that joins a table to itself repetitively to extract meaningful insights from the rows with nested hierarchies or relationships. To illustrate how a recursive self join can be used on nested relationships in a table, consider a table called companyemployees.

The Company Employees Table in DbVisualizer.
The Company Employees Table in DbVisualizer.

From the Company Employees table, we can use a recursive self join to determine the hierarchy of employees in the company from top to bottom based on their relationships with their managers using the query below.

Copy
        
1 WITH RECURSIVE EmployeeHierarchy AS ( 2 SELECT employee_id, employee_name, manager_id, 0 AS level 3 FROM companyemployees 4 WHERE manager_id IS NULL 5 UNION ALL 6 SELECT emp.employee_id, emp.employee_name, emp.manager_id, eh.level + 1 7 FROM companyemployees emp 8 JOIN EmployeeHierarchy eh ON emp.manager_id = eh.employee_id 9 ) 10 SELECT employee_id, employee_name, level 11 FROM EmployeeHierarchy;

In the query above, a common table expression (CTE) called EmployeeHierarchy is used together with the WITH RECURSIVE syntax. Inside the CTE, the query has two parts.

The first part of the query selects the topmost manager and assigns them a level of 0, assuming the topmost manager has NULL as their manager_id. The second part of the query is defined using the UNION ALL clause to loop through each employee to determine their hierarchy level based on previous levels of managers they report to.

When you run the query, it will result to a table that includes an employee’s name and their hierarchy level in the company management. The 0 represents the topmost manager, while the 5 represents the bottom-most manager.

Recursive Self Join Query Results in DbVisualizer.
Recursive Self Join Query Results in DbVisualizer.

From the results above, John Smith is the topmost manager at level 0. John Doe is at level 1, reporting to John Smith, while Jane Smith is at level 2, reporting to John Doe. Joseph Reid is the bottom-most manager at level 5, reporting to Nick Adams - no one reports to Joseph.

Applications of Self Joins

Self joins have various applications in database management and querying. Here are some common scenarios where self joins can be helpful:

  • Managing hierarchical data: Self joins are often used to represent hierarchical relationships, such as organizational charts, product categories, or file systems. By joining a table to itself, you can retrieve parent-child relationships and navigate through the hierarchy.
  • Employee-Manager relationships: In an employee database, self joins can be used to retrieve information about employees and their managers. By comparing the employee ID with the manager ID within the same table, you can identify the reporting structure and gather details about managers and their subordinates.
  • Network analysis: Self joins can be employed in network analysis scenarios. For example, if you have a table representing connections between users in a social network, a self join can help identify common connections, mutual friends, or indirect relationships between users.
  • Comparing related records: Self joins also enable the comparison of related records within a table. For example, you can compare sales records of customers with the same postal code or find customers who have made multiple purchases over a specific period.
  • Analyzing time-based data: Self joins can be used to analyze time-series data within a table. By joining the table with itself based on time intervals, you can compare records from different time periods, calculate changes or trends, and identify patterns.

Conclusion

In conclusion, a Self Join is used to establish the parent-child or hierarchy relationships between different rows in the same table and extract meaningful insights. A recursive self join is an extension of a self join that joins a table to itself repetitively to extract meaningful insights from the rows with nested hierarchies or relationships. 

We hope you’ve found this blog useful - make sure to follow our blog for updates and news in the database space, until next time.

FAQs

What is a self join in SQL?

A self join is a type of join operation in a relational database where a table is joined with itself. It allows you to combine rows from the same table based on a related condition.

What is the difference between full, inner, and self join?

A full join returns all tables' rows, including matching and non-matching rows. An inner join returns only the matching rows between two tables based on a specified join condition. A self-join is a specific type of join where a table is joined with itself.

Dbvis download link img
About the author
Bonnie
Bonnie
Bonnie is a web developer and technical writer creating easy-to-understand technical articles.
The Table Icon
Sign up to receive The Table's roundup
More from the table
Title Author Tags Length Published
title

Outer Join in SQL: A Comprehensive Guide

author Ochuko Onojakpor tags JOIN 11 min 2024-02-19
title

Inner Join in SQL: A Comprehensive Guide

author Ochuko Onojakpor tags JOIN 10 MINS 2023-11-30
title

SQL Cheat Sheet: The Ultimate Guide to All Types of SQL JOINS

author Leslie S. Gyamfi tags JOIN 12 MINS 2023-07-31
title

How to Join Three or More Tables with SQL

author Leslie S. Gyamfi tags JOIN 7 MINS 2023-07-24
title

How to join your tables using ERD

author Scott A. Adams tags ERD JOIN 8 MINS 2021-12-21
title

The Definitive Guide to the NULL SQL Server Value

author Antonello Zanini tags SQL SERVER 7 min 2024-09-12
title

PostgreSQL NULLIF: Conditional Logic Made Easier

author Antonello Zanini tags NULLIF POSTGRESQL 6 min 2024-09-11
title

The Power of the SQL Server Indexed View

author Ochuko Onojakpor tags Indexed view 9 min 2024-09-10
title

Distributing Data in a Database: A Guide to Database Sharding

author Lukas Vileikis tags MySQL OPTIMIZATION SQL 7 min 2024-09-09
title

The Postgres UPDATE Statement: A Deep Dive

author Leslie S. Gyamfi tags POSTGRESQL 4 min 2024-09-05

The content provided on dbvis.com/thetable, including but not limited to code and examples, is intended for educational and informational purposes only. We do not make any warranties or representations of any kind. Read more here.

Cookie policy

We use cookies to ensure that we give you the best experience on our website. However you can change your cookie settings at any time in your browser settings. Please find our cookie policy here ↗