Not In != Not Exists

So, in SQL, what is the distinction between NOT IN and NOT EXISTS?

This can be a bit confusing, and is a bit of a gotcha. Because SQL is a declarative language, the idea is that you declare what you want to query. But sometimes, a normal English declaration of intent does not always map to the correct SQL construct. And the NOT IN vs. NOT EXISTS mixup is a good case in point.

I’ll demonstrate this by way of example. Say we have the following database schema (run the script CreateCampusDB.sql to create and populate the database):

As you can see, a Student may also be a lecturer. This is often the case where phd graduates are also lecturers.

Now, we want to know which lecturers are not also students. If we verbalise this, we might say “Tell me all of the lecturers which are not students”. That starts to look like, “Tell me all the lecturers which are not in the list of all the students.” So, we may form the query:

select * from dbo.Lecturer l where l.StudentId not in (select st.StudentId from dbo.Student st)

The logic seems correct. If the foreign key of Lecturer does not show up in the list of primary keys in the Student table, then the Lecturer will not be a Student. Wrong! Fail!

Taking a good look at that query, you will never get the information that you seek. The subquery brings back a finite list of primary keys, not including nulls. A lecturer will be a “not-Student” where it has a null value in the StudentId foreign key column of the Lecturer table. A comparison of a value (primary key) to null will return no rows at all, regardles of whether it is negated or not (* see further discussion of this point below).

The correct way to get our result would be to use NOT EXISTS and a correlated subquery:

select * from dbo.Lecturer l where not exists (select * from dbo.Student st where l.StudentId  = st.StudentId )

Here, we have a row by row comparison and the null values are taken into account in the evaluation.

Or, you could use an outer join as so:

select * 
from dbo.Student st right outer join dbo.Lecturer l 
on st.StudentId = l.StudentId
where l.StudentId is null

(I only showed that as a proof of the query above which uses NOT EXISTS.)

* This post has been prevaricated on the assumption that the query is running with ANSI_NULLS on. That is the default behavour i.e. comparing a value to null will return no rows at all. This makes sense, as null is an unknown. It usually means that it is unknown whether there is a value or not. You can change that default behaviour by setting ANSI_NULLS to OFF. If you do this and run the 1st query, you will get the same anwser as the correlated subquery and the outer join:

select * from dbo.Lecturer l where l.StudentId not in (select st.StudentId from dbo.Student st)

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>