Tag Archives: SQL

Using Stuff to Clean Up String Concatenation

In my last post, I covered the simple, yet handy, STUFF function. It’ll take a string and “stuff it right in there”. Today, I want to cover a real work scenario where I use it quite often. I’ll use the AdventureWorks database (“AWD”) to demonstrate.

Joins create rows. If you run the following query against the AWD, you’ll see the SalesOrderId repeated for each SalesOrderDetail row:

	Sales.SalesOrderHeader soh
	Sales.SalesOrderDetail sod on sod.SalesOrderID = soh.SalesOrderID

But there are times where you want only 1 row for each SalesOrderID. And so we need a mechanism which can effectively concatenate all the SalesOrderDetailIDs for each SalesOrderID and return it as a single cell, adjacent to the corresponding SalesOrderID.

Consider the following query:

	,(SELECT CHAR(44) + CAST(sod.SalesOrderDetailID AS NVARCHAR(MAX))
			 FROM Sales.SalesOrderDetail sod
			 WHERE soh.SalesOrderID = sod.SalesOrderID
			 FOR XML PATH('row'), ROOT('result'),TYPE) AS SalesOrderDetailIDs
FROM Sales.SalesOrderHeader soh
ORDER BY soh.SalesOrderID

which yields the following resultset:

Figure 1

You can see that via the use of the XML query syntax and a correlated subquery, I have gone a long way to achieving our goal. All the SalesOrderDetailIDs are in a cell such that there is only 1 row for each SalesOrderID. But we are not interested in well-formed XML, having the XML type (which was done by using the TYPE directive and specifying a ROOT element). We need to peel back the XML elements and extract the value within. We can use the xQuery API to return an NVARCHAR(MAX), which has effectively returned the contents of each XML “row” element.

So now our query looks like this:

	,((SELECT CHAR(44) + CAST(sod.SalesOrderDetailID AS NVARCHAR(MAX))
			 FROM Sales.SalesOrderDetail sod
			 WHERE soh.SalesOrderID = sod.SalesOrderID
			 FOR XML PATH('row'),TYPE, ROOT('result')).value('/result[1]','varchar(max)')) AS SalesOrderDetailIDs
FROM Sales.SalesOrderHeader soh
ORDER BY soh.SalesOrderID

But we’re not quite there. We have 1, small, pesky comma at the beginning of the NVARCHAR, an example of which looks like this:


This is where STUFF comes in. If we were to deal with this in isolation, it would look like this:

SELECT STUFF(',1,2,3,4,5,6,7,8,9,10,11,12', 1,1, N'')

We are telling STUFF to go to the first character, delete 1 character and replace it with an empty string. That’s how we remove the leading comma. Incorporating that into the main query, we have our final code:

			 FROM Sales.SalesOrderDetail sod
			 WHERE soh.SalesOrderID = sod.SalesOrderID
			 FOR XML PATH('row'),TYPE, ROOT('result')).value('/result[1]','varchar(max)'),1,1,N'')) AS SalesOrderDetailIDs
FROM Sales.SalesOrderHeader soh
ORDER BY soh.SalesOrderID

And our resultset:

Figure 2

Note: Rob Farley, in his post Handling special characters with FOR XML PATH, notes that you can:

  1. Remove the ROOT
  2. Change the xQuery to .value('.','varchar(max)')

Rob also makes the point that “it’s always felt strange to me to be returning something which I declare to be XML (using TYPE) without it being well-formed”. I agree with this. It’s a bit weird and makes it harder to see what is going on. But I wanted to point that out for completeness.

Otherwise, that’s a real world example usage of STUFF.

Parsing a Comma Delimited List in SQL Server Scripts – Cleverly

There’s an external guy we use for really complex SQL. We have a report which has been the ruin of many a developer, and after many failures, it became apparent that developers could not do this. That’s when you need to get in a proper SQL merchant. Someone who thinks in terms of sets, rather than while loops.

But I digress. In one of his Scripts, I saw something which I found very interesting. He wrote a few short lines of code which parse a comma-delimited list (passed in as a parameter). Now, I’ve seen many a function which has been created for this, and they all seemed to fit about what I would do if I went to write it. But this guy… This is how he wrote it (inserting the result into a table variable called @HospitalIDTable):

		declare @HospitalIds CHAR (13) = '1,2,3,4,5,6,9' -- pretend this is a parameter
		declare			@XML as xml
		declare			@Delimiter as char(1) =','

		set				@XML = CAST(('<X>'+REPLACE(@HospitalIds ,@Delimiter ,'</X><X>')+'</X>') as xml)
		declare			@HospitalIDTable table (HospitalId int)
		insert into		@HospitalIDTable
		select			N.value('.', 'INT') AS HospitalId 
		from			@XML.nodes('X') AS T(N)
                -- done

If you output the @XML variable, you will see:


Now, the final SELECT statement – I don’t even understand. I know N is to make a string unicode, but value??? I need to research that. Back in a tick.

OK. So the value method is the XML data type method for XQuery. I should have known that, because I’ve done that before. Calling it off the N prefix is what threw me, and I still haven’t totally grokked what is going on there. But it is as cool as heck and if you do a SELECT on the @HospitalIDTable table variable, you get your list as a column of data.

Very concise and cool.

Experts like this guy make light work of complex SQL. We’ve had several dev’s look at his scripts and everyone agrees that their relationship with SQL is not on the same level as this guys. We’re front-end/middle-tiers devs. What did you expect?!

SQL Server Version

If you want to know details about which version of SQL Server is running on a machine, you can run the following query:

SELECT @@version as [Verbose Version Info]
    , SERVERPROPERTY('productversion') as 'Product Version' 
    , SERVERPROPERTY('productlevel') as 'Patch Level'  
    , SERVERPROPERTY('edition') as 'Product Edition'
    , SERVERPROPERTY('buildclrversion') as 'CLR Version'
    , SERVERPROPERTY('collation') as 'Default Collation'
    , SERVERPROPERTY('instancename') as 'Instance'
    , SERVERPROPERTY('lcid') as 'LCID'
    , SERVERPROPERTY('servername') as 'Server Name'

Also, for the specific version of the databases:

SELECT name, version 
FROM sys.sysdatabases

When would you run this? You get error messages that specify a particular version of SQL Server must be installed for a particular database to be attached/restored (as the case may be).

Not In != Not Exists

So, in SQL, what is the distinction between NOT IN and NOT EXISTS?

This can be a bit confusing, and is a bit of a gotcha. Because SQL is a declarative language, the idea is that you declare what you want to query. But sometimes, a normal English declaration of intent does not always map to the correct SQL construct. And the NOT IN vs. NOT EXISTS mixup is a good case in point.

I’ll demonstrate this by way of example. Say we have the following database schema (run the script CreateCampusDB.sql to create and populate the database):

As you can see, a Student may also be a lecturer. This is often the case where phd graduates are also lecturers.

Now, we want to know which lecturers are not also students. If we verbalise this, we might say “Tell me all of the lecturers which are not students”. That starts to look like, “Tell me all the lecturers which are not in the list of all the students.” So, we may form the query:

select * from dbo.Lecturer l where l.StudentId not in (select st.StudentId from dbo.Student st)

The logic seems correct. If the foreign key of Lecturer does not show up in the list of primary keys in the Student table, then the Lecturer will not be a Student. Wrong! Fail!

Taking a good look at that query, you will never get the information that you seek. The subquery brings back a finite list of primary keys, not including nulls. A lecturer will be a “not-Student” where it has a null value in the StudentId foreign key column of the Lecturer table. A comparison of a value (primary key) to null will return no rows at all, regardles of whether it is negated or not (* see further discussion of this point below).

The correct way to get our result would be to use NOT EXISTS and a correlated subquery:

select * from dbo.Lecturer l where not exists (select * from dbo.Student st where l.StudentId  = st.StudentId )

Here, we have a row by row comparison and the null values are taken into account in the evaluation.

Or, you could use an outer join as so:

select * 
from dbo.Student st right outer join dbo.Lecturer l 
on st.StudentId = l.StudentId
where l.StudentId is null

(I only showed that as a proof of the query above which uses NOT EXISTS.)

* This post has been prevaricated on the assumption that the query is running with ANSI_NULLS on. That is the default behavour i.e. comparing a value to null will return no rows at all. This makes sense, as null is an unknown. It usually means that it is unknown whether there is a value or not. You can change that default behaviour by setting ANSI_NULLS to OFF. If you do this and run the 1st query, you will get the same anwser as the correlated subquery and the outer join:

select * from dbo.Lecturer l where l.StudentId not in (select st.StudentId from dbo.Student st)

Using a From Clause in an Update Statement

So, lets say you have a table of data as follows:

And lets say you have a destination table with the same schema. But the data is not necessarily in the same order:
We’re told that the combination of first name and last name are unique in this schema. Now, the idea is to copy all of the primary keys in the top table to fill the nulls in the bottom table. Of course, this is easily done with a handful of update statements here. But what if both tables have thousands of rows? That’s where we can use SQL Server’s UPDATE statement with a FROM clause.

UPDATE ContactTableNew
SET id = cs.id
FROM ContactTableNew cn INNER JOIN ContactTableSource cs ON cs.fname = cn.fname and cs.lname = cn.lname

OK. So exactly what have we done? It’s kinda like a join. Well, it is a join. We have taken data from one table and updated the new table with that data.