intro
In a database, tables can be related to each other through different types of relationships, such as One-to-One Relationships, One-to-Many Relationships, and Many-to-Many Relationships. Based on these relationships, SQL joins are used to extract meaningful information from the data in the tables. However, retrieving meaningful insights from data in the same table with a parent-child relationship can be challenging. In this case, a Self Join is used to establish the parent-child relationships between different rows in the same table and extract meaningful insights.
What Is a Self Join in SQL?
A Self Join is a type of a JOIN
query used to compare rows within the same table. Unlike other SQL JOIN
queries that join two or more tables, a self join joins a table to itself. To use a self join, a table must have a unique identifier column, a parent column, and a child column.
For example, a table can have a primary key column, all employees in a company column, and managers that each employee in the company reports to column. Since all managers in the company are also employees, the managers' column in the table holds the primary key that represents each manager as an employee.
In this case, a Self Join can be used to get the names of all employees and the names of managers they report to in the company. Then the results can be presented in a table with a primary key, employee_name, and manager_name columns.
Self Join Syntax and Table Aliases
A self join syntax often looks like so:
1
SELECT table1.column1, table2.column2
2
FROM table_name table1
3
JOIN table_name table2 ON table1.column = table2.column;
In the syntax above, table_name is the name of the table that self join is used to join to itself. Table1 and table2 are table aliases used to represent rows being compared.
Column1, and column2 are used to represent rows being compared to each other in the resulting table. The table1.column and table2.column are columns used to establish the relationship between two rows in a table.
We’ll begin by demonstrating how to use a self-join query on a PostgreSQL database instance.
Using a Self Join on Relationships In a Table
To illustrate how a self join can be used on relationships in a table, consider a table called Employees.
In the Employees table, the manager_id column represents the relationship between each employee and their manager. Based on the Employee-Manager relationship in the Employees table, we can use a self join to get each employee's name and the name of their respective manager using the query below.
1
SELECT emp.employee_name AS employee, mng.employee_name AS manager
2
FROM Employees emp
3
JOIN Employees mng ON emp.manager_id = mng.employee_id
In the query above, emp and mng represent table aliases where emp is for the employee while mng is for the manager. Then the table is joined on the condition that the manager’s ID (mng.employee_id) matches the employee’s manager ID (emp.manager_id).
When you run the query, it will result in a table that includes an employee's name and their respective manager's name.
From the resulting table above, we can see the relationship or hierarchy between employees and their managers. Jane Smith reports to John Doe, while Isabel Archer reports to Jane Smith.
Recursive Self Join
A recursive self join is an extension of a self join that joins a table to itself repetitively to extract meaningful insights from the rows with nested hierarchies or relationships. To illustrate how a recursive self join can be used on nested relationships in a table, consider a table called companyemployees.
From the Company Employees table, we can use a recursive self join to determine the hierarchy of employees in the company from top to bottom based on their relationships with their managers using the query below.
1
WITH RECURSIVE EmployeeHierarchy AS (
2
SELECT employee_id, employee_name, manager_id, 0 AS level
3
FROM companyemployees
4
WHERE manager_id IS NULL
5
UNION ALL
6
SELECT emp.employee_id, emp.employee_name, emp.manager_id, eh.level + 1
7
FROM companyemployees emp
8
JOIN EmployeeHierarchy eh ON emp.manager_id = eh.employee_id
9
)
10
SELECT employee_id, employee_name, level
11
FROM EmployeeHierarchy;
In the query above, a common table expression (CTE) called EmployeeHierarchy
is used together with the WITH RECURSIVE
syntax. Inside the CTE, the query has two parts.
The first part of the query selects the topmost manager and assigns them a level of 0, assuming the topmost manager has NULL
as their manager_id
. The second part of the query is defined using the UNION ALL
clause to loop through each employee to determine their hierarchy level based on previous levels of managers they report to.
When you run the query, it will result to a table that includes an employee’s name and their hierarchy level in the company management. The 0 represents the topmost manager, while the 5 represents the bottom-most manager.
From the results above, John Smith is the topmost manager at level 0. John Doe is at level 1, reporting to John Smith, while Jane Smith is at level 2, reporting to John Doe. Joseph Reid is the bottom-most manager at level 5, reporting to Nick Adams - no one reports to Joseph.
Applications of Self Joins
Self joins have various applications in database management and querying. Here are some common scenarios where self joins can be helpful:
Conclusion
In conclusion, a Self Join is used to establish the parent-child or hierarchy relationships between different rows in the same table and extract meaningful insights. A recursive self join is an extension of a self join that joins a table to itself repetitively to extract meaningful insights from the rows with nested hierarchies or relationships.
We hope you’ve found this blog useful - make sure to follow our blog for updates and news in the database space, until next time.
FAQs
What is a self join in SQL?
A self join is a type of join operation in a relational database where a table is joined with itself. It allows you to combine rows from the same table based on a related condition.
What is the difference between full, inner, and self join?
A full join returns all tables' rows, including matching and non-matching rows. An inner join returns only the matching rows between two tables based on a specified join condition. A self-join is a specific type of join where a table is joined with itself.