10 SQL Statements for 90% of Your Data Science Tasks đŸ§Ș

Farhan Faiyaz
10 min readAug 11, 2023

Structured Query Language (SQL) is a programming language for storing and processing information in a relational database. SQL is pretty intuitive, and anyone can pick it up even without prior programming knowledge. In fact, SQL is my first programming language (C doesn’t count since I had no idea what I was doing), and I was able to cross the Valley of Disappointment since it was relatively easy to grasp.

SQL is immensely powerful when it comes to data manipulation, such as filtering, sorting, aggregating, and grouping data. In 10 minutes, you’ll learn 10 Essential SQL statements to get you up and running with this programming language.

1. SELECT

The SELECT statement extracts data from the database. It can retrieve data from one or more tables. Here’s an example of a SELECT statement:

Players Table
SELECT Name, Position
FROM Players;
Output

In this example, Name and Position are the columns you want to retrieve from the “Players” table. To retrieve all the records from the Players table, the following statement is used:

SELECT *
FROM Players;
Output

The output is basically the entire “Players” table.

2. WHERE

A WHERE clause filters records based on one or more conditions. A where clause enables you to:

  • Filter unnecessary data.
  • Data analysis. Using the WHERE clause can allow you to calculate aggregates such as SUM, COUNT, AVG, etc., or identify data anomalies.
  • Condition combination. You can use multiple conditions in the WHERE clause, utilizing logical operators (AND, OR, NOT) to form complex filters.

Here’s an example of a WHERE clause used to filter values for the “Destinations” table:

Destinations
SELECT DestinationName, Continent
FROM Destinations
WHERE Continent = 'Europe';

The WHERE clause above returns all attributes (columns) after the SELECT statement (DestinationName, Continent), where the Continent was “Europe”.

Output

3. JOIN

A JOIN clause combines rows from two or more tables based on a related column between them. There are several types of JOINs in SQL.

Image from W3Schools

For the next four types of JOINs, let's use a database with four tables- “books”, “authors”, “editors”, and “translators” (taken from LearnSQL).

books
authors
editors
translators

INNER JOIN

This type of join returns records that have matching values in both tables. Here’s an example:

SELECT b.id, b.title, a.first_name, a.last_name
FROM books b
INNER JOIN authors a
ON b.author_id = a.id
ORDER BY b.id;
Output

In this example, we want to show the titles and the corresponding author. The SQL query joins the “books” table with the “authors” table by matching the author_id in both tables. The SELECT statement indicates which columns are to be displayed. The FROM clause specifies that the first table to join (left table) is the “books” table, and the INNER JOIN specifies the second table (right table) to join, which is the “authors” table.

To make the code more concise, aliasing is done (b for “books” and a for “authors”).

LEFT (OUTER) JOIN

This type of join returns all records from the left table and the matched records from the right table. The result will contain NULL values if there are no matches in the right table. Here’s an example:

SELECT b.id, b.title, e.last_name AS editor
FROM books b
LEFT JOIN editors e
ON b.editor_id = e.id
ORDER BY b.id;
Output

In this example, the objective is to show the basic book information (ID and title) along with the last names of the respective editors. The LEFT JOIN is done on the editor_id of the “books” table with the id on the “editors” table. Since there is no editor for “My Last Book” in the database, the result for id 8 contains a NULL value.

RIGHT (OUTER) JOIN

This type of join returns all records from the right table and the matched records from the left table. The result will contain NULL values if no matches are in the left table. Here’s an example:

SELECT b.id, b.title, e.last_name AS editor
FROM books b
RIGHT JOIN editors e
ON b.editor_id = e.id
ORDER BY b.id;
Output

Let’s repeat the previous example, but this time keep all the records from the “editors” table. It’s the same query; we just replace LEFT JOIN with RIGHT JOIN. We can see that Jones and Smith are editors, but there are no corresponding books they’ve edited, hence the NULL values.

In practice, explicit right outer joins are rarely used since they can always be replaced with left outer joins and provide no additional functionality.

FULL (OUTER) JOIN

This type of join returns all records, even the unmatched ones. Here’s an example:

SELECT b.id, b.title, e.last_name AS editor
FROM books b
FULL JOIN editors e
ON b.editor_id = e.id
ORDER BY b.id;
Output

In the example above, all the books are kept even without matching editors. In addition, editors that don't have any corresponding books are also present in the output.

4. GROUP BY

The GROUP BY clause in SQL is used for grouping rows from a table based on specified columns. It is often used in conjunction with aggregate functions (such as SUM, COUNT, AVG, MAX, and MIN) to perform computations on groups of rows sharing a similar value in one or more columns.

Below is a general syntax for a query using the GROUP BY clause:

SELECT column1, column2, ..., aggregate_function(column)
FROM table
GROUP BY column1, column2, ...;
Order

Let’s consider the “Order” table above. We want to find the average price and total quantity for each category. Using the GROUP BY clause, this can be achieved.

SELECT Category, SUM(Quantity) AS TotalQuantity, AVG(Price) AS AvgPrice
FROM Orders
GROUP BY Category;
Output

In this example, the GROUP BY clause groups the rows by Category, and we also formed two new columns using aggregate functions (SUM and AVG) to calculate the total quantity and average price in each category.

5. HAVING

The HAVING clause in SQL is often used with the GROUP BY clause to filter groups based on a specified list of conditions. You might be wondering why we can't use the WHERE clause instead. WHERE clauses cannot be used with aggregate functions; thus, we resort to the HAVING clause.

Let’s consider the “movies” table below. We’ll use the HAVING clause to find the average rating of movie genres with an average rating greater than or equal to 9.

movies
SELECT Genre, AVG(Rating) AS AvgRating
FROM Movies
GROUP BY Genre
HAVING AVG(Rating) >= 9;
Output

Here, the GROUP BY clause groups the movies by their Genre and the AVG function calculates the average value of each genre. Finally, the HAVING clause filters out the genres with an average rating equal to or greater than 9.

6. UNION

The UNION operator combines the data from the results of two or more SELECT command queries into a single, distinct result set. This operator removes any duplicates present in the results being combined.

Let’s say that we want to find people living in the same suburb in Sydney from different company departments. David, who’s new in Randwick, wants to make friends who live in the same area, and he does not care if he/she is from another department. Let's see how we can do this using the UNION operator:

HRDepartment
ITDepartment
FinanceDepartment
SELECT EmployeeID, EmployeeName, Suburb
FROM HRDepartment
WHERE Suburb = 'Randwick'
UNION
SELECT EmployeeID, EmployeeName, Suburb
FROM ITDepartment
WHERE Suburb = 'Randwick'
UNION
SELECT EmployeeID, EmployeeName, Suburb
FROM FinanceDepartment
WHERE Suburb = 'Randwick';

The UNION statement combines employee records from all three tables, narrowing the employees down to the ones living in “Randwick” which is obtained using the WHERE clause.

It is crucial, however, that the SELECT statements have the same number of columns and that the columns have compatible data types.

Output

Now David can invite John and Sarah for a housewarming party!

7. CREATE

The CREATE keyword in SQL can be used to create new database objects such as tables, views, indexes, or other data structures. A few examples of the CREATE keywords in SQL are:

  • CREATE TABLE
  • CREATE INDEX
  • CREATE VIEW
  • CREATE DATABASE
  • CREATE PROCEDURE/ FUNCTION

Let’s dive into an example to create a new table called “TopChartSongs”.

CREATE TABLE TopChartSongs (
SongID INT PRIMARY KEY,
SongTitle VARCHAR(100),
Artist VARCHAR(100),
ReleaseDate DATE,
Rank INT
);

In the code above, the CREATE TABLE statement creates a new table named “TopChartSongs” with five columns (fields/attributes). Each column consists of a particular data type, and the first column, i.e., SongID is the unique identifier or the primary key for the table. The SongTitle column is defined as a string (VARCHAR) with a maximum of 100 characters, and so on.

8. INSERT

The INSERT statement is used to insert data into a database table. Let’s use the table built in the previous example and insert some values:

INSERT INTO TopChartSongs (SongID, SongTitle, Artist, ReleaseDate, Rank)
VALUES
(101, 'Last Night', 'Morgan Wallen', '2023-02-06', 1),
(102, 'Fast Car', 'Luke Combs', '2023-03-24', 2),
(103, 'Meltdown', 'Travis Scott Featuring Drake', '2023-07-28', 3),
(104, 'Cruel Summer', 'Taylor Swift', '2019-08-23', 4),
(105, 'FE!N', 'Travis Scott Featuring Playboi Carti', '2023-07-28', 5);

The query above inserts rows into the “TopChartSongs” table, and then data can be retrieved from the table as per requirement. For example:

SELECT *
FROM TopChartSongs
WHERE Artist LIKE '%Travis Scott%';

For the code above, we are retrieving the rows where Travis Scott is present as an Artist. Here we use the LIKE command (bonus) to search for a specified pattern in the column. The “%” symbol is a wildcard that matches any sequence of characters. So, %Travis Scott% will match any artist name that contains "Travis Scott" anywhere in it. The output is shown below:

Output

It seems like UTOPIA is killing it!

9. UPDATE

The UPDATE statement is used to modify existing records in a table. Let’s consider the “WorldCup” table before the 2022 FIFA World Cup.

WorldCup

But we all know that Argentina secured their third FIFA championship title under the leadership of the 🐐 . Let’s use the UPDATE statement to change this:

UPDATE WorldCup
SET WorldCupsWon = 3
WHERE TeamID = 4;
WorldCup (after using UPDATE)

The UPDATE statement modified the existing records in the “WorldCup” table. The SET clause specified the columns and new values to update; in this case, the WorldCupsWon column is to be updated, and the new value will be 3 WHERE the TeamID is 4.

P.s. taking nothing away from the other 🐐, I’ve been a Ronaldo fanboy since I was a kid.

10. DELETE

The DELETE statement deletes one or more rows from a table. Let’s consider the “GymMembership” table below:

GymMembership

Suppose Michael gets his heart broken, and instead of hitting the gym, he decides to stop going and spend his lonely days snacking on Doritos and Coke. To remove his information from the current database, the DELETE statement can be used as shown:

DELETE FROM GymMemberships
WHERE MemberID = 3;
GymMembership (after using DELETE)

The DELETE FROM statement specifies which table to delete the values from, and the WHERE statement specifies which row to delete, indicating the primary key (unique to each record).

Man up Michael! Canceling the gym membership is even sadder than the breakup!

TL;DR

  • The SELECT statement is used to select data from a database.
  • The WHERE clause is used to filter records.
  • A JOIN clause combines rows from two or more tables based on a related column between them.
  • The GROUP BY statement groups rows that have the same values into summary rows. The GROUP BY statement is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result set by one or more columns.
  • The HAVING clause was added to SQL because the WHERE keyword cannot be used with aggregate functions.
  • The UNION operator is used to combine the result set of two or more SELECT statements.
  • The CREATEkeyword in SQL can be used to create new database objects such as tables, views, indexes, or other data structures.
  • The INSERT INTO statement is used to insert new records into a table.
  • The UPDATE statement is used to modify the existing records in a table.
  • The DELETE statement is used to delete existing records in a table.
Thank you for making it to the end. Image is made using Midjourney

--

--