Understanding SQL Joins and Aggregate Functions

Joining Tables in SQL and Using Aggregate Functions

Introduction to SQL Joins

Before we dive into the specifics of joining tables in SQL, let’s take a step back and understand what joins are. In relational databases, data is stored in multiple tables that contain related information. To retrieve data from these tables, you need to join them based on common columns.

There are several types of SQL joins, including:

  • Inner join: Returns records that have matching values in both tables.
  • Left join (or left outer join): Returns all the records from the left table and the matched records from the right table. If there’s no match, the result will contain null values for the right table columns.
  • Right join (or right outer join): Similar to a left join, but returns all the records from the right table and the matched records from the left table.
  • Full outer join: Returns all the records from both tables, with null values in the columns where there’s no match.

Joining Tables

Let’s focus on joining two tables. We’ll use an inner join as an example, but we can apply the same logic to other types of joins.

Suppose we have two tables: Table1 and Table2. Both tables contain information about employees, with some common columns between them.

-- Table schema
CREATE TABLE Table1 (
    id INT,
    num VARCHAR(255),
    amt DECIMAL(10, 2)
);

CREATE TABLE Table2 (
    id INT,
    location VARCHAR(255)
);

Now, let’s create sample data:

-- Insert data into tables
INSERT INTO Table1 (id, num, amt)
VALUES
(1, '12345', 1000.00),
(2, '67890', 500.00),
(3, '11111', 2000.00);

INSERT INTO Table2 (id, location)
VALUES
(1, 'New York'),
(2, 'Los Angeles'),
(3, 'Chicago');

We want to join these tables based on the id column and retrieve the num, amt, and location columns.

-- Join tables using an inner join
SELECT 
    Table1.num,
    Table1.amt,
    Table2.location
FROM 
    Table1
INNER JOIN 
    Table2 ON Table1.id = Table2.id;

This query will return the following result:

numamtlocation
123451000.00New York
67890500.00Los Angeles
111112000.00Chicago

Using Aggregate Functions

Now that we’ve joined the tables, let’s use an aggregate function to calculate the sum of the amt column for each id.

The SUM function is used to add up a series of numbers.

-- Use the SUM function on the amt column
SELECT 
    Table1.num,
    Table2.location,
    SUM(Table1.amt) AS total_amt
FROM 
    Table1
INNER JOIN 
    Table2 ON Table1.id = Table2.id
GROUP BY 
    Table1.num, Table2.location;

This query will return the following result:

numlocationtotal_amt
12345New York1000.00
67890Los Angeles500.00
11111Chicago2000.00

Notice that we’ve grouped the results by both num and location. This means that for each group, the SUM function will calculate the total amount.

Handling Null Values

When using aggregate functions, it’s essential to handle null values. If you don’t handle them correctly, your query might produce unexpected results or errors.

Let’s assume we have an employee who has no salary (i.e., a null value in the amt column).

-- Employee with no salary
INSERT INTO Table1 (id, num, amt)
VALUES
(4, '98765', NULL);

If we join this table with Table2 and use an aggregate function on the amt column without handling null values, our query will produce incorrect results.

-- Incorrect query without handling null values
SELECT 
    Table1.num,
    Table2.location,
    SUM(Table1.amt) AS total_amt
FROM 
    Table1
INNER JOIN 
    Table2 ON Table1.id = Table2.id
GROUP BY 
    Table1.num, Table2.location;

To handle this situation, we can use the COALESCE function to replace null values with a default value (e.g., 0).

-- Correct query using COALESCE
SELECT 
    Table1.num,
    Table2.location,
    SUM(COALESCE(Table1.amt, 0)) AS total_amt
FROM 
    Table1
INNER JOIN 
    Table2 ON Table1.id = Table2.id
GROUP BY 
    Table1.num, Table2.location;

Alternatively, we can use the ISNULL function (if our database management system supports it) or modify our query to exclude rows with null values.

Handling Multiple Columns in Aggregate Functions

When using aggregate functions like SUM, AVG, MAX, and MIN, you often need to specify multiple columns to operate on.

Let’s say we want to calculate the average salary for each department, as well as the sum of salaries.

-- Calculate average salary by department
SELECT 
    Table2.location,
    AVG(Table1.amt) AS avg_salary
FROM 
    Table1
INNER JOIN 
    Table2 ON Table1.id = Table2.id
GROUP BY 
    Table2.location;

This query will return the following result:

locationavg_salary
New York1500.00
Los Angeles500.00
Chicago2000.00

Notice that we’ve grouped by the location column only.

Joining Multiple Tables

When working with multiple tables, you might need to join them using different join types (e.g., inner join, left join).

Let’s say we have three tables: Table1, Table2, and Table3. We want to join these tables based on their common columns.

-- Table schema
CREATE TABLE Table1 (
    id INT,
    num VARCHAR(255),
    amt DECIMAL(10, 2)
);

CREATE TABLE Table2 (
    id INT,
    location VARCHAR(255)
);

CREATE TABLE Table3 (
    id INT,
    dept VARCHAR(255)
);

We can join these tables using an inner join.

-- Join multiple tables using an inner join
SELECT 
    Table1.num,
    Table2.location,
    Table3.dept
FROM 
    Table1
INNER JOIN 
    Table2 ON Table1.id = Table2.id
INNER JOIN 
    Table3 ON Table1.id = Table3.id;

This query will return the following result:

numlocationdept
12345New YorkSales
67890Los AngelesMarketing
11111ChicagoIT

Notice that we’ve used an inner join for each table to ensure that all rows in the result set come from the intersection of the three tables.

I hope this explanation has helped you understand how to use aggregate functions with joins!


Last modified on 2025-03-27