Removing subqueries from 'negative joins' in SQL

September 2, 2011 Pivotal Labs

I recently worked on a project that had Users who were members of Groups,
and we had to figure out how to get a list of all Groups that a User was not a part of. In this post I’ll describe two ways of accomplishing this: one with a subquery, and one without.

Removing subqueries may, in some cases, increase the performance of the query. If you have a subquery in an SQL statement and the query optimizer is telling you that the subquery is expensive, you can probably accomplish the same thing with regular joins and improve things a bit. The only way to tell is to run them on realistic datasets.

The data model

First, create the Users, Groups and Memberships tables, and add just enough data to test the queries, like so (using PostgreSQL syntax):

create table users (id serial, name varchar);
create table groups (id serial, name varchar);
create table memberships (id serial, user_id int, group_id int);
insert into users (name) values ('Dexter'), ('Lumen');
insert into groups (name) values ('Dark Passengers'), ('Vigilantes'), ('Normal People');
insert into memberships (user_id, group_id) values (1,1), (1,2), (2,2);

The users table should look like:

select * from users;

 id |  name
----+--------
  1 | Dexter
  2 | Lumen

The groups table should look like:

select * from groups;

 id |     name
----+---------------
  1 | Dark Passengers
  2 | Vigilantes
  3 | Normal People

And the memberships table should look like:

select users.name, groups.name
from memberships
inner join users on users.id = memberships.user_id
inner join groups on groups.id = memberships.group_id;

  name  |      name
--------+-----------------
 Dexter | Dark Passengers
 Dexter | Vigilantes
 Lumen  | Vigilantes

Notice how Dexter is a member 2 groups, and the ‘Normal People’ group has no members.

Example 1: Using a subquery

Finding the groups that Dexter is not a member of is easy with a subquery, and looks something like this:

select name
from groups
where id not in (
  select group_id from memberships where user_id = 1
  );

    name
---------------
 Normal People

As you can see, the dataset returned is correct, and the query is relatively simple to read.

Example 2: Grouping and counting members

NOTE: see tk’s comment below for a simpler version.

Depending on your database setup, it might be more performant to eliminate that subquery. Here’s one way to do it:

select groups.name
from groups
left join memberships on memberships.group_id = groups.id
left join memberships filtered_memberships on filtered_memberships.id = memberships.id
  and filtered_memberships.user_id != 1
group by groups.id, groups.name
having count(memberships.user_id) = count(filtered_memberships.user_id);

     name
---------------
 Normal People

Let’s break that down a bit. The query above is doing this:

  • Getting all groups
  • Counting how many members are in each group
  • Counting how many non-Dexter members are in each group
  • If the two counts are the same, we know that Dexter must not be in that group

Here’s another way of looking at it, without the group by clause:

select groups.name,
  memberships.user_id as member_id,
  filtered_memberships.user_id as filtered_member_id
from groups
left join memberships on memberships.group_id = groups.id
left join memberships filtered_memberships on filtered_memberships.id = memberships.id
  and filtered_memberships.user_id != 1;

      name       | member_id | filtered_member_id
-----------------+-----------+--------------------
 Dark Passengers |         1 |
 Vigilantes      |         1 |
 Vigilantes      |         2 |                  2
 Normal People   |           |

To figure out which groups Dexter is not a part of, count the number of entries in each column, and if they match, it means that Dexter is not a part of that group.

At first this may look odd, because the query joins to memberships twice: once for all members, once for just non-Dexter members.
It’s important that the filter (user_id != 1) be applied to the join clause, and not in a where clause, which would end up improperly removing rows from the result set.

Thanks to Charles LeRose for coming up with the idea of comparing the counts of the memberships.

Can you think of a better way to get the same result set without a subquery? If so, post it in the comments below!

About the Author

Biography

Previous
MongoDB Redux
MongoDB Redux

Chris Westin from 10gen discusses more in-depth uses of mongoDB and answers questions from the audience.

Next
Ruby + Salesforce = Happy Developers
Ruby + Salesforce = Happy Developers

Pivotal is proud to announce the release of the databasedotcom gem. Developed in partnership with Heroku a...