EXCEL
EXPORT

Excel to SQL Basics

intro

If you know Excel you can easily start learning SQL.

Tools used in the tutorial
Tool Description Link
DBVISUALIZER
TOP RATED DATABASE MANAGEMENT TOOL AND SQL CLIENT

SQL is the programming language used to explore, analyze, and extract data from relational databases, which are common data storage mechanisms. We also know that many individuals who want to break into a data career are often unsure of how to start learning SQL, as they have never touched a relational database. Fortunately, many of these same individuals have used Excel (or similar spreadsheet software) and understand some of the fundamentals of working with data. Fortunately, conducting similar data tasks in SQL is fairly straightforward and the examples below will show you how.

All materials can be downloaded here. These examples use the New York City Airbnb Open Data, which is directly downloadable from Kaggle.

For more information on establishing a connection to a SQLite database in DbVisualizer, see our previous post Why Can’t We Just Have a Single Table? Navigating Relationships in Relational Databases.

View Your Data

Excel

In Excel we can simply open a file and the data will appear.

Excel

In SQL we use SELECT statements to output data. The example below uses SELECT * FROM AB_NYC_2019 which is an instruction to select (SELECT) all columns (*) from (FROM) the table named AB_NYC_2019. The media below shows the execution of this command in DbVisualizer.

Filter

Excel

Filtering on one or more values in a column can be accomplished with the Sort & Filter tool in the Home menu.

SQL

The WHERE clause is used to filter on values in SQL. Note that the WHERE clause must be written after the initial SELECT statement. The code below selects all columns from AB_NYC_2019 where the neighborhood_group column has a value of “Brooklyn”. Be aware that column values which are text rather than numbers, as is the case with values in neighborhood_group, must be contained between quotation marks.

Copy
        
1 SELECT * FROM AB_NYC_2019 2 WHERE neighbourhood_group = "Brooklyn";

SQL

Mutliple filter conditions can be used in a WHERE clause with the OR operator.

Copy
        
1 SELECT * FROM AB_NYC_2019 2 WHERE neighbourhood_group = "Brooklyn" 3 OR neighbourhood_group = "Manhattan";

However, if we wanted to filter to all records except those records in Brooklyn it is more efficient to use the <> operator, which means “not equal”.

Copy
        
1 SELECT * FROM AB_NYC_2019 2 WHERE neighbourhood_group <> "Brooklyn",;

!= also works as a not equal operator in SQL.

Copy
        
1 SELECT * FROM AB_NYC_2019 2 WHERE neighbourhood_group != "Brooklyn";

Basic Aggregate Functions

The syntax for basic aggregate functions is quite similar between Excel and SQL.

ExcelSQL
MeanAVERAGE()AVG()
SumSum()Sum()
MinMin()Min()
MaxMax()Max()
CountCount()Count()

For instance, if we want the obtain the mean for a given column in Excel, say price in column J, we write the following formula in an empty cell: =AVERAGE(J:J), which returns a value of 152.720687.

In SQL, running SELECT AVG(price) FROM AB_NYC_2019 also returns a value of 152.720687 (when rounded).

Aggregate Functions by Group

The previous section showed how to use basic aggregate functions to return statistical measures on the entire dataset. However, there are cases where we will want to conduct an aggregate function like the mean within specific groups in the data. In the current example, let’s say that we want to see the mean price by neighborhood groups.

Excel

A straightforward way to conduct aggregations by groups in Excel is to use pivot tables. We can set up a pivot table in a new sheet by navigating to the Insert menu and clicking on Insert Pivot Table and From Table/Range. By default, the entire sheet should be selected.

The screen below shows how to create a table of the mean prices by neighbourhood group once the pivot table opens in a new sheet.

SQL

We are going to add some more code to our previous SQL query, SELECT AVG(price) FROM AB_NYC_2019. First, let’s add neighbourhood_group as an additional column to select, and be sure to separate selected columns with commas. Now, if you execute SELECT neighbourhood_group, AVG(price) FROM AB_NYC_2019 the command will run without errors, but the output will not be the desired output.

Why is there only one neighbourhood group value? In short, because the SQL query above provides instructions to take the mean of price across the entire table, which produces a single value. As such, the resulting output can only have a single row, so the neighbourhood_group value for the first record in the data table is returned in the neighbourhood_group column in the output.

Is there a solution to obtain the desired output? Yes! We need to add a GROUP BY statement after the SELECT statement, which provides an instruction to perform any aggregations by values of the column following GROUP BY. In our example, the code is as follows.

Copy
        
1 SELECT neighbourhood_group, AVG(price) FROM AB_NYC_2019 2 GROUP BY neighbourhood_group,;

You may notice that the resulting output does not include the Grand Total row that was output by default in the Excel pivot table. However, the mean price for each neighbourhood group is returned with the SQL query above.

Dbvis download link img

Wrapping Up

With the examples above, you now have a reference for conducting basic data analytic tasks in both Excel and SQL. So go on and try these examples out on your own.

About the author
Scott A. Adams

Scott is a data scientist, social scientist, and educator who cannot think of anything witty to put here.

The Table Icon
Sign up to receive The Table's roundup
More from the table
Title Author Tags Length Published
title

SQL REPLACE Function: A Comprehensive Guide

author TheTable tags SQL 5 min 2024-04-25
title

SQL ORDER BY Clause: Definitive Guide

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 7 min 2024-04-22
title

SQL EXCEPT: Everything You Need to Know

author Leslie S. Gyamfi tags SQL 8 min 2024-04-18
title

Adding Dates in SQL: A Complete Tutorial

author Antonello Zanini tags DATE DATETIME MySQL POSTGRESQL SQL SERVER 7 min 2024-04-15
title

Glossary of the SQL Commands You Need to Know

author Antonello Zanini tags MySQL ORACLE POSTGRESQL SQL SQL SERVER 12 min 2024-04-11
title

SUBSTRING_INDEX in SQL Explained: A Guide

author Lukas Vileikis tags MySQL SQL 8 min 2024-04-08
title

SQL NOT IN: the Good, Bad & the Ugly

author Lukas Vileikis tags MySQL SQL 7 min 2024-04-04
title

SQL Add to Date Operations: A Complete Guide

author Antonello Zanini tags DATE DATETIME MySQL ORACLE POSTGRESQL SQL SQL SERVER 6 min 2024-04-01
title

SQL CAST Function: Everything You Need to Know

author Antonello Zanini tags CAST MySQL ORACLE POSTGRESQL SQL SQL SERVER 7 min 2024-03-28
title

PostgreSQL Upsert: INSERT ON CONFLICT Guide

author Antonello Zanini tags POSTGRESQL UPSERT 7 min 2024-03-25

The content provided on dbvis.com/thetable, including but not limited to code and examples, is intended for educational and informational purposes only. We do not make any warranties or representations of any kind. Read more here.

Cookie policy

We use cookies to ensure that we give you the best experience on our website. However you can change your cookie settings at any time in your browser settings. Please find our cookie policy here ↗