http://www.developer.com/db/choosing-the-right-sql-join.html

Back to article

3 SQL Join Concepts to Help You Choose the Right Join


February 14, 2011

What do SQL joins and the "teach a man to fish" Chinese proverb have in common? SQL joins, like regular expressions, are one of those commonplace programming tasks in which true success is entirely dependent upon your ability to conceptualize the outcome. Fail to do so and you'll likely wind up spending a few hours in a frustrating round of trial and error. Like regular expressions, the proliferation of online examples has actually contributed to the frustration, providing the equivalent of a day's worth of fish rather than the proverbial fishing pool.

So in this tutorial I thought I'd take a somewhat different approach, and attempt to shed some insight into this heady topic by presenting it in terms of concepts rather than simply laying out some specific examples. Of course, providing concepts without accompanying real-world insight wouldn't be particularly practical, and so I'll illustrate the different join types using a simple online store example.

The Project Requiring SQL Joins

Suppose you created an online store that allowed customers to refer a friend in order to earn a special discount off their next order. The customer base rapidly increases in the weeks following the referral program announcement, leading you to believe the program is a success. But without simple analytics at your disposal it's not really possible to truly measure success, and so you set out to use SQL joins to shed some light not only on the success of this new program, but also on your online operation in general.

A simplified version of the relevant MySQL tables follows:

CREATE TABLE customers (
id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
referral_id INTEGER UNSIGNED NOT NULL,
FOREIGN KEY (referral_id) REFERENCES customers(id)
);

CREATE TABLE orders (
id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
ordered_on TIMESTAMP NOT NULL,
customer_id INTEGER UNSIGNED NOT NULL,
FOREIGN KEY(customer_id) REFERENCES customers(id)
);

SQL Join Concept #1: The INNER JOIN

Prospective customers may create an account but wind up leaving the website before completing a purchase. It would be beneficial to know more about just how many purchases each customer made, which among other things would be useful for following up with VIP customers for reason of providing them with special offers. Suppose the customers and orders tables look like this:

mysql> select * from customers;
+----+---------------+-------------+
| id | name | referral_id |
+----+---------------+-------------+
| 1 | John Smith | 0 |
| 2 | Josey Wales | 1 |
| 3 | Bob McClain | 1 |
| 4 | Peter Sellers | 3 |
+----+---------------+-------------+

mysql> select * from orders;
+----+---------------------+-------------+
| id | ordered_on | customer_id |
+----+---------------------+-------------+
| 1 | 2011-02-11 11:27:59 | 1 |
| 2 | 2011-02-11 11:28:09 | 1 |
| 3 | 2011-02-11 11:28:12 | 2 |
| 4 | 2011-02-11 11:28:15 | 1 |
| 5 | 2011-02-11 11:28:19 | 4 |
| 6 | 2011-02-11 11:28:21 | 4 |
| 7 | 2011-02-11 11:28:23 | 2 |
+----+---------------------+-------------+

A SQL join type known as the INNER JOIN is the appropriate approach, because the goal is to return matches found in both tables. Because we also want to tally up the number of purchases made by each customer, and order the customers according to the number of purchases they made, the count() function, GROUP BY clause, and ORDER BY clause will also be used:

SELECT COUNT(o.id) AS 'orders', c.name 
FROM customers c
INNER JOIN orders o ON o.customer_id = c.id
GROUP BY o.customer_id
ORDER BY orders desc;

Executing this query produces the following outcome:

+--------+---------------+
| orders | name |
+--------+---------------+
| 3 | John Smith |
| 2 | Josey Wales |
| 2 | Peter Sellers |
+--------+---------------+

Remember: the INNER JOIN is useful when the need arises to return information about records appearing in both tables.

SQL Join Concept #2: The SELF JOIN

It would be useful to have a look at which customers were responsible for referring a new customer. Because the referrer's referral ID is stored in the very same table where customer information is stored, we need to execute a SELF JOIN. The SELF JOIN is actually a variation of the INNER JOIN in which both tables happen to be identical.

To illustrate the example, here is a partial listing of the customers table:

+----+---------------+-------------+
| id | name | referral_id |
+----+---------------+-------------+
| 1 | John Smith | 0 |
| 2 | Josey Wales | 1 |
| 3 | Bob McClain | 1 |
| 4 | Peter Sellers | 3 |
+----+---------------+-------------+

Therefore to retrieve both the customer's name and the associated referrer's name, the join query must be constructed in such a way that we're retrieving the referrer's name based on a comparison of the id and the referral_id columns.

SELECT c.name as 'Customer', r.name as 'Referrer'
FROM customers c
INNER JOIN customers r
ON c.referral_id = r.id;

Executing this query produces the following results:

+---------------+-------------+
| Customer | Referrer |
+---------------+-------------+
| Josey Wales | John Smith |
| Bob McClain | John Smith |
| Peter Sellers | Bob McClain |
+---------------+-------------+

Remember: you should use a SELF JOIN when you need to compare a table's primary key to a foreign key, which happens to reside in the same table.

SQL Join Concept #3: The OUTER JOIN

In your drive to create a successful business, it's crucial to maximize the potential of your most valuable resource: the customer. Word of mouth has been proven time and again to be the most effective way to attract new business, and therefore encouraging every customer to make a referral is crucial. One way to make sure every customer refers at least one friend is to further incentivize those customers who have yet to refer somebody. This is the opposite tack taken in the previous example, because this time we want to know only about customers whose primary key is not associated with a foreign key.

A LEFT OUTER JOIN is useful for learning more about records that do not appear in another table, because it will return all records from the table even if they do not appear in the other table. Those that do not appear in the other table will return NULL values within the desired column. Consider the following example and associated output:

SELECT c.name, o.customer_id 
FROM customers c
LEFT OUTER JOIN orders o
ON c.id = o.customer_id;

+---------------+-------------+
| name | customer_id |
+---------------+-------------+
| John Smith | 1 |
| John Smith | 1 |
| John Smith | 1 |
| Josey Wales | 2 |
| Josey Wales | 2 |
| Bob McClain | NULL |
| Peter Sellers | 4 |
| Peter Sellers | 4 |
+---------------+-------------+

Notice the NULL value associated with Bob McClain. This means that Bob has yet to make a purchase. Of course, he's the only prospective customer we're interested in, so you can modify the previous query to filter according to which customer_id values are NULL:

SELECT c.name FROM customers c 
LEFT OUTERJOIN orders o
ON c.id = o.customer_id
WHERE o.customer_id IS NULL;

Executing this query returns:

+-------------+
| name |
+-------------+
| Bob McClain |
+-------------+

Choosing the Right SQL Join

When it's time to start taking advantage of SQL joins, take the time to consider the query's expected behavior--especially the desired data's relationship with the tables in question. Being able to conceptualize this relationship is going to help you to quickly determine which type of join should be embraced, saving you a great deal of time and tears!

Sitemap | Contact Us