MySQL

How to Display All Duplicate Records in a MySQL Table

intro

Let’s explore different techniques to display all duplicate records in a MySQL table, ranging from using a GROUP BY query to more modern and lesser-known methods.

Tools used in the tutorial
Tool Description Link
Dbvisualizer DBVISUALIZER
TOP RATED DATABASE MANAGEMENT TOOL AND SQL CLIENT

Duplicated records can significantly hinder your data operations, from slowing down your MySQL database to leading to misleading results in aggregate queries, data integrity issues, and inefficient storage usage. The first step in addressing duplicates is identifying them, which can be achieved through various display all duplicate records MySQL methods.

In this article, you will learn the definition of duplicate records and discover how to efficiently retrieve and manage them like a pro.

Let’s dive in!

What Is a Duplicate Record in MySQL?

In MySQL, a duplicate record is a row in a table that shares identical values across all or a significant subset of columns with other rows. To be more precise, in most cases, duplicate records do not have the exact same values in every column, as at least the value of the primary key column is different.

Duplicate records often result from data entry errors or data import issues, leading to inaccurate query results, inefficient storage, and data integrity problems. Generally, it is best to prevent duplicates with unique constraints or to periodically remove them—keeping only one of the many duplicates.

Techniques to Display All Duplicate Records in MySQL

Time to explore some display all duplicate records MySQL approaches!

The sample queries to find duplicates will be run on the following products table:

The products table in DbVisualizer
The products table in DbVisualizer

As you can see, it contains some duplicate records.

Note: The queries in the sections below will be executed on DbVisualizer, a top-rated database client with full support for MySQL. Still, any other database client will do.

1. Using GROUP BY

The most common approach to displaying all duplicate records in MySQL is to group rows by one or more columns and select only groups with a count greater than one. This is possible with a GROUP BY query with the COUNT(*) > 1 HAVING clause:

Copy
        
1 SELECT name, description, price, currency, COUNT(*) 2 FROM products 3 GROUP BY name, description, price, currency 4 HAVING COUNT(*) > 1;

In this example, all non-primary key columns are used to create groups. While this is the most effective approach, grouping by many columns can slow down your query, especially on large datasets.

Often, grouping by one or two columns is sufficient. For example, grouping by product name alone might be enough to detect duplicate records. In real-world scenarios, adjust the GROUP BY clause to include only the columns relevant to your duplication criteria.

Execute the query, and you will get a result that looks as follows:

These are exactly the duplicate records
These are exactly the duplicate records

Note that the result set also indicates the count of occurrences for each duplicate record.

2. Using a SELF JOIN

A self JOIN is a JOIN operation performed on the same table. In other words, it means to merge a table with itself using a JOIN. This is useful in a variety of scenarios, including finding duplicates. Learn more in our guide on self JOINs in SQL.

You can retrieve all duplicates from a table in MySQL using a self-join as follows:

Copy
        
1 SELECT DISTINCT p1.name, p1.description, p1.price, p1.currency 2 FROM products p1 3 JOIN products p2 ON p1.name = p2.name 4 AND p1.description = p2.description 5 AND p1.price = p2.price 6 AND p1.price = p2.price 7 AND p1.id <> p2.id;

In this case, we performed the JOIN on all non-primary key columns in products. In most cases, using a limited subset of columns is enough to determine duplicate criteria.

Note that the DISTINCT keyword is required to ensure that the result set contains only unique combinations of the selected columns. Without DISTINCT, the result set would contain multiple rows for each duplicate entry, as you can see below:

Note that the duplicate records appear several times
Note that the duplicate records appear several times

3. Using the ROW_NUMBER Function

ROW_NUMBER() is a MySQL 8+ function that returns the number of the current row within a partition. Rows numbers range from 1 to the number of partition rows.

That function can be used to assign row numbers to duplicate entries in a CTE (Common Table Expression) and then display them:

Copy
        
1 WITH duplicated_products AS ( 2 SELECT *, 3 ROW_NUMBER() OVER(PARTITION BY name, description, price, currency ORDER BY id) AS row_num 4 FROM products 5 ) 6 SELECT DISTINCT name, description, price, currency 7 FROM duplicated_products 8 WHERE row_num > 1;

The above query assigns a row number to each duplicate record in a partition, based on name, description, price, and currency. Then, it selects only those with row numbers greater than one. Bear in mind that for this query to be successfully executed, you’d have to have an id column existing on the table.

Best Practices For Dealing With Duplicate Records

Below are some best practices for managing duplicate records in MySQL:

  • Implement unique constraints: Enforce UNIQUE constraints on critical columns to prevent duplicate entries at the database level.
  • Perform regular data audits: Conduct regular audits of your data to identify and address duplicates promptly.
  • Data validation rules: Establish robust data validation rules during data entry to minimize the chances of duplicates.
  • Use standardized formats: Enforce standardized data formats (e.g., date formats, case sensitivity, etc.) to improve consistency and reduce duplicates.
  • Merge duplicate records: Use queries or tools to merge duplicate records, retaining the most accurate and relevant information.
  • Educate users: Train users on best practices for data entry to minimize human errors that lead to duplicates.

Conclusion

In this guide, you learned what duplicate records are in MySQL, why they pose a problem, and how to display them using different approaches. You now know how to detect and select duplicate rows in a MySQL table.

As shown here, DbVisualizer simplifies query execution and data manipulation in databases. This powerful visual database client supports various DBMS technologies and offers advanced features such as query optimization, SQL formatting, and ERD-like schema generation. Try DbVisualizer for free today!

FAQ

What criteria should be considered when choosing a specific record over its duplicates?

When selecting a specific record from duplicates, these are some possible options:

  • Keep the latest record based on creation or update timestamp
  • Keep the record with the highest or lowest ID
  • Choose records with more complete data (in the case where two records are duplicated based on the value of some but not all columns)

How to delete all duplicate records in a MySQL table?

To delete all duplicate records in a MySQL table while keeping one instance of each duplicate, you can use a subquery as follows:

Copy
        
1 DELETE FROM products 2 WHERE id NOT IN ( 3 SELECT MIN(id) 4 FROM products 5 GROUP BY name, description, price, currency 6 );

This query retains the row with the minimum id for each unique combination of name, description, price, and currency, deleting the rest.

How to avoid duplicate records in MySQL?

To avoid duplicate records in MySQL, consider the following strategies:

  • Use primary keys: Define a primary key for your table, ensuring each row has a unique identifier
  • Unique constraints: Apply unique constraints on columns where duplicates are not allowed
  • Data validation: Implement validation checks in your application to prevent duplicate data entry
  • Normalize your database: Structure your database to reduce redundancy and maintain data integrity through normalization

What is the best display all duplicate records MySQL approach for large datasets?

One of the most effective ways to display duplicate records in MySQL for large datasets is to use a GROUP BY query combined with the HAVING clause to filter for duplicates. To optimize performance, limit the columns in the GROUP BY clause to the minimum necessary for detecting duplicates. Additionally, consider applying a specific index to the columns used in the GROUP BY clause to enhance query efficiency. See how to quickly define an efficient SQL index for GROUP BY queries.

Why use an SQL client?

Using an SQL client like DbVisualizer enables you to visualize your data, making development tasks more manageable. A robust SQL client provides a range of tools that simplify data management and analysis, no matter which database management system you are using. These features enhance productivity, streamline workflows, and help you gain insights from your data more effectively.

Dbvis download link img
About the author
Antonello Zanini

Antonello is a software engineer, and often refers to himself as a technology bishop. His mission is to spread knowledge through writing.

The Table Icon
Sign up to receive The Table's roundup
More from the table
Title Author Tags Length Published
title

The Most Interesting Functions in MySQL – A Guide

author Lukas Vileikis tags MySQL 5 min 2024-12-27
title

MySQL Primary Key: What It Is, How It Works, and When to Use It

author Lukas Vileikis tags Data Visualization Tools DbVisualizer MySQL 8 min 2024-12-23
title

SQL CHECK Constraint: Definitive Guide With Examples

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 8 min 2024-12-16
title

A Guide To the SQL DECLARE Statement

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SERVER 7 min 2024-12-12
title

SQL Upsert: Inserting a Record If It Does Not Exist

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 8 min 2024-12-10
title

A Complete Guide to the MySQL COUNT Function

author Antonello Zanini tags MySQL 9 min 2024-12-09
title

MySQL Nullable Columns: Everything You Need to Know

author Antonello Zanini tags MySQL 6 min 2024-12-03
title

A Complete Guide to the MySQL GROUP_CONCAT Function

author Antonello Zanini tags Data aggregation MySQL 8 min 2024-12-02
title

How and Why to Add an Index in SQL

author Lukas Vileikis tags MySQL POSTGRESQL Search SQL SQL SERVER 8 min 2024-11-28
title

MySQL 9.0 and the New EXPLAIN Features

author Lukas Vileikis tags MySQL SQL 9 min 2024-11-27

The content provided on dbvis.com/thetable, including but not limited to code and examples, is intended for educational and informational purposes only. We do not make any warranties or representations of any kind. Read more here.