10 SQL Statements for 90% of Your Data Science Tasks đ§Ș
Structured Query Language (SQL) is a programming language for storing and processing information in a relational database. SQL is pretty intuitive, and anyone can pick it up even without prior programming knowledge. In fact, SQL is my first programming language (C doesnât count since I had no idea what I was doing), and I was able to cross the Valley of Disappointment since it was relatively easy to grasp.
SQL is immensely powerful when it comes to data manipulation, such as filtering, sorting, aggregating, and grouping data. In 10 minutes, youâll learn 10 Essential SQL statements to get you up and running with this programming language.
1. SELECT
The SELECT statement extracts data from the database. It can retrieve data from one or more tables. Hereâs an example of a SELECT statement:
SELECT Name, Position
FROM Players;
In this example, Name and Position are the columns you want to retrieve from the âPlayersâ table. To retrieve all the records from the Players table, the following statement is used:
SELECT *
FROM Players;
The output is basically the entire âPlayersâ table.
2. WHERE
A WHERE clause filters records based on one or more conditions. A where clause enables you to:
- Filter unnecessary data.
- Data analysis. Using the WHERE clause can allow you to calculate aggregates such as SUM, COUNT, AVG, etc., or identify data anomalies.
- Condition combination. You can use multiple conditions in the WHERE clause, utilizing logical operators (AND, OR, NOT) to form complex filters.
Hereâs an example of a WHERE clause used to filter values for the âDestinationsâ table:
SELECT DestinationName, Continent
FROM Destinations
WHERE Continent = 'Europe';
The WHERE clause above returns all attributes (columns) after the SELECT statement (DestinationName, Continent), where the Continent was âEuropeâ.
3. JOIN
A JOIN clause combines rows from two or more tables based on a related column between them. There are several types of JOINs in SQL.
For the next four types of JOINs, let's use a database with four tables- âbooksâ, âauthorsâ, âeditorsâ, and âtranslatorsâ (taken from LearnSQL).
INNER JOIN
This type of join returns records that have matching values in both tables. Hereâs an example:
SELECT b.id, b.title, a.first_name, a.last_name
FROM books b
INNER JOIN authors a
ON b.author_id = a.id
ORDER BY b.id;
In this example, we want to show the titles and the corresponding author. The SQL query joins the âbooksâ table with the âauthorsâ table by matching the author_id in both tables. The SELECT statement indicates which columns are to be displayed. The FROM clause specifies that the first table to join (left table) is the âbooksâ table, and the INNER JOIN specifies the second table (right table) to join, which is the âauthorsâ table.
To make the code more concise, aliasing is done (b for âbooksâ and a for âauthorsâ).
LEFT (OUTER) JOIN
This type of join returns all records from the left table and the matched records from the right table. The result will contain NULL values if there are no matches in the right table. Hereâs an example:
SELECT b.id, b.title, e.last_name AS editor
FROM books b
LEFT JOIN editors e
ON b.editor_id = e.id
ORDER BY b.id;
In this example, the objective is to show the basic book information (ID and title) along with the last names of the respective editors. The LEFT JOIN is done on the editor_id of the âbooksâ table with the id on the âeditorsâ table. Since there is no editor for âMy Last Bookâ in the database, the result for id 8 contains a NULL value.
RIGHT (OUTER) JOIN
This type of join returns all records from the right table and the matched records from the left table. The result will contain NULL values if no matches are in the left table. Hereâs an example:
SELECT b.id, b.title, e.last_name AS editor
FROM books b
RIGHT JOIN editors e
ON b.editor_id = e.id
ORDER BY b.id;
Letâs repeat the previous example, but this time keep all the records from the âeditorsâ table. Itâs the same query; we just replace LEFT JOIN with RIGHT JOIN. We can see that Jones and Smith are editors, but there are no corresponding books theyâve edited, hence the NULL values.
In practice, explicit right outer joins are rarely used since they can always be replaced with left outer joins and provide no additional functionality.
FULL (OUTER) JOIN
This type of join returns all records, even the unmatched ones. Hereâs an example:
SELECT b.id, b.title, e.last_name AS editor
FROM books b
FULL JOIN editors e
ON b.editor_id = e.id
ORDER BY b.id;
In the example above, all the books are kept even without matching editors. In addition, editors that don't have any corresponding books are also present in the output.
4. GROUP BY
The GROUP BY clause in SQL is used for grouping rows from a table based on specified columns. It is often used in conjunction with aggregate functions (such as SUM, COUNT, AVG, MAX, and MIN) to perform computations on groups of rows sharing a similar value in one or more columns.
Below is a general syntax for a query using the GROUP BY clause:
SELECT column1, column2, ..., aggregate_function(column)
FROM table
GROUP BY column1, column2, ...;
Letâs consider the âOrderâ table above. We want to find the average price and total quantity for each category. Using the GROUP BY clause, this can be achieved.
SELECT Category, SUM(Quantity) AS TotalQuantity, AVG(Price) AS AvgPrice
FROM Orders
GROUP BY Category;
In this example, the GROUP BY clause groups the rows by Category, and we also formed two new columns using aggregate functions (SUM and AVG) to calculate the total quantity and average price in each category.
5. HAVING
The HAVING clause in SQL is often used with the GROUP BY clause to filter groups based on a specified list of conditions. You might be wondering why we can't use the WHERE clause instead. WHERE clauses cannot be used with aggregate functions; thus, we resort to the HAVING clause.
Letâs consider the âmoviesâ table below. Weâll use the HAVING clause to find the average rating of movie genres with an average rating greater than or equal to 9.
SELECT Genre, AVG(Rating) AS AvgRating
FROM Movies
GROUP BY Genre
HAVING AVG(Rating) >= 9;
Here, the GROUP BY clause groups the movies by their Genre and the AVG function calculates the average value of each genre. Finally, the HAVING clause filters out the genres with an average rating equal to or greater than 9.
6. UNION
The UNION operator combines the data from the results of two or more SELECT command queries into a single, distinct result set. This operator removes any duplicates present in the results being combined.
Letâs say that we want to find people living in the same suburb in Sydney from different company departments. David, whoâs new in Randwick, wants to make friends who live in the same area, and he does not care if he/she is from another department. Let's see how we can do this using the UNION operator:
SELECT EmployeeID, EmployeeName, Suburb
FROM HRDepartment
WHERE Suburb = 'Randwick'
UNION
SELECT EmployeeID, EmployeeName, Suburb
FROM ITDepartment
WHERE Suburb = 'Randwick'
UNION
SELECT EmployeeID, EmployeeName, Suburb
FROM FinanceDepartment
WHERE Suburb = 'Randwick';
The UNION statement combines employee records from all three tables, narrowing the employees down to the ones living in âRandwickâ which is obtained using the WHERE clause.
It is crucial, however, that the SELECT statements have the same number of columns and that the columns have compatible data types.
Now David can invite John and Sarah for a housewarming party!
7. CREATE
The CREATE keyword in SQL can be used to create new database objects such as tables, views, indexes, or other data structures. A few examples of the CREATE keywords in SQL are:
- CREATE TABLE
- CREATE INDEX
- CREATE VIEW
- CREATE DATABASE
- CREATE PROCEDURE/ FUNCTION
Letâs dive into an example to create a new table called âTopChartSongsâ.
CREATE TABLE TopChartSongs (
SongID INT PRIMARY KEY,
SongTitle VARCHAR(100),
Artist VARCHAR(100),
ReleaseDate DATE,
Rank INT
);
In the code above, the CREATE TABLE statement creates a new table named âTopChartSongsâ with five columns (fields/attributes). Each column consists of a particular data type, and the first column, i.e., SongID is the unique identifier or the primary key for the table. The SongTitle column is defined as a string (VARCHAR) with a maximum of 100 characters, and so on.
8. INSERT
The INSERT statement is used to insert data into a database table. Letâs use the table built in the previous example and insert some values:
INSERT INTO TopChartSongs (SongID, SongTitle, Artist, ReleaseDate, Rank)
VALUES
(101, 'Last Night', 'Morgan Wallen', '2023-02-06', 1),
(102, 'Fast Car', 'Luke Combs', '2023-03-24', 2),
(103, 'Meltdown', 'Travis Scott Featuring Drake', '2023-07-28', 3),
(104, 'Cruel Summer', 'Taylor Swift', '2019-08-23', 4),
(105, 'FE!N', 'Travis Scott Featuring Playboi Carti', '2023-07-28', 5);
The query above inserts rows into the âTopChartSongsâ table, and then data can be retrieved from the table as per requirement. For example:
SELECT *
FROM TopChartSongs
WHERE Artist LIKE '%Travis Scott%';
For the code above, we are retrieving the rows where Travis Scott is present as an Artist. Here we use the LIKE command (bonus) to search for a specified pattern in the column. The â%â symbol is a wildcard that matches any sequence of characters. So, %Travis Scott% will match any artist name that contains "Travis Scott" anywhere in it. The output is shown below:
It seems like UTOPIA is killing it!
9. UPDATE
The UPDATE statement is used to modify existing records in a table. Letâs consider the âWorldCupâ table before the 2022 FIFA World Cup.
But we all know that Argentina secured their third FIFA championship title under the leadership of the đ . Letâs use the UPDATE statement to change this:
UPDATE WorldCup
SET WorldCupsWon = 3
WHERE TeamID = 4;
The UPDATE statement modified the existing records in the âWorldCupâ table. The SET clause specified the columns and new values to update; in this case, the WorldCupsWon column is to be updated, and the new value will be 3 WHERE the TeamID is 4.
P.s. taking nothing away from the other đ, Iâve been a Ronaldo fanboy since I was a kid.
10. DELETE
The DELETE statement deletes one or more rows from a table. Letâs consider the âGymMembershipâ table below:
Suppose Michael gets his heart broken, and instead of hitting the gym, he decides to stop going and spend his lonely days snacking on Doritos and Coke. To remove his information from the current database, the DELETE statement can be used as shown:
DELETE FROM GymMemberships
WHERE MemberID = 3;
The DELETE FROM statement specifies which table to delete the values from, and the WHERE statement specifies which row to delete, indicating the primary key (unique to each record).
Man up Michael! Canceling the gym membership is even sadder than the breakup!
TL;DR
- The
SELECT
statement is used to select data from a database. - The
WHERE
clause is used to filter records. - A
JOIN
clause combines rows from two or more tables based on a related column between them. - The
GROUP BY
statement groups rows that have the same values into summary rows. TheGROUP BY
statement is often used with aggregate functions (COUNT()
,MAX()
,MIN()
,SUM()
,AVG()
) to group the result set by one or more columns. - The
HAVING
clause was added to SQL because theWHERE
keyword cannot be used with aggregate functions. - The
UNION
operator is used to combine the result set of two or moreSELECT
statements. - The
CREATE
keyword in SQL can be used to create new database objects such as tables, views, indexes, or other data structures. - The
INSERT INTO
statement is used to insert new records into a table. - The
UPDATE
statement is used to modify the existing records in a table. - The
DELETE
statement is used to delete existing records in a table.