I recently worked on a project that had Users who were members of Groups,
and we had to figure out how to get a list of all Groups that a User was not a part of. In this post I’ll describe two ways of accomplishing this: one with a subquery, and one without.
Removing subqueries may, in some cases, increase the performance of the query. If you have a subquery in an SQL statement and the query optimizer is telling you that the subquery is expensive, you can probably accomplish the same thing with regular joins and improve things a bit. The only way to tell is to run them on realistic datasets.
The data model
First, create the Users, Groups and Memberships tables, and add just enough data to test the queries, like so (using PostgreSQL syntax):
create table users (id serial, name varchar); create table groups (id serial, name varchar); create table memberships (id serial, user_id int, group_id int); insert into users (name) values ('Dexter'), ('Lumen'); insert into groups (name) values ('Dark Passengers'), ('Vigilantes'), ('Normal People'); insert into memberships (user_id, group_id) values (1,1), (1,2), (2,2);
The users table should look like:
select * from users; id | name ----+-------- 1 | Dexter 2 | Lumen
The groups table should look like:
select * from groups; id | name ----+--------------- 1 | Dark Passengers 2 | Vigilantes 3 | Normal People
And the memberships table should look like:
select users.name, groups.name from memberships inner join users on users.id = memberships.user_id inner join groups on groups.id = memberships.group_id; name | name --------+----------------- Dexter | Dark Passengers Dexter | Vigilantes Lumen | Vigilantes
Notice how Dexter is a member 2 groups, and the ‘Normal People’ group has no members.
Example 1: Using a subquery
Finding the groups that Dexter is not a member of is easy with a subquery, and looks something like this:
select name from groups where id not in ( select group_id from memberships where user_id = 1 ); name --------------- Normal People
As you can see, the dataset returned is correct, and the query is relatively simple to read.
Example 2: Grouping and counting members
NOTE: see tk’s comment below for a simpler version.
Depending on your database setup, it might be more performant to eliminate that subquery. Here’s one way to do it:
select groups.name from groups left join memberships on memberships.group_id = groups.id left join memberships filtered_memberships on filtered_memberships.id = memberships.id and filtered_memberships.user_id != 1 group by groups.id, groups.name having count(memberships.user_id) = count(filtered_memberships.user_id); name --------------- Normal People
Let’s break that down a bit. The query above is doing this:
- Getting all groups
- Counting how many members are in each group
- Counting how many non-Dexter members are in each group
- If the two counts are the same, we know that Dexter must not be in that group
Here’s another way of looking at it, without the
group by clause:
select groups.name, memberships.user_id as member_id, filtered_memberships.user_id as filtered_member_id from groups left join memberships on memberships.group_id = groups.id left join memberships filtered_memberships on filtered_memberships.id = memberships.id and filtered_memberships.user_id != 1; name | member_id | filtered_member_id -----------------+-----------+-------------------- Dark Passengers | 1 | Vigilantes | 1 | Vigilantes | 2 | 2 Normal People | |
To figure out which groups Dexter is not a part of, count the number of entries in each column, and if they match, it means that Dexter is not a part of that group.
At first this may look odd, because the query joins to memberships twice: once for all members, once for just non-Dexter members.
It’s important that the filter (
user_id != 1) be applied to the
join clause, and not in a
where clause, which would end up improperly removing rows from the result set.
Thanks to Charles LeRose for coming up with the idea of comparing the counts of the memberships.
Can you think of a better way to get the same result set without a subquery? If so, post it in the comments below!
About the AuthorMore Content by Pivotal Labs