Category Archives: Programming

Using Stuff to Clean Up String Concatenation

In my last post, I covered the simple, yet handy, STUFF function. It’ll take a string and “stuff it right in there”. Today, I want to cover a real work scenario where I use it quite often. I’ll use the AdventureWorks database (“AWD”) to demonstrate.

Joins create rows. If you run the following query against the AWD, you’ll see the SalesOrderId repeated for each SalesOrderDetail row:

SELECT TOP 150
	soh.SalesOrderID
	,sod.SalesOrderDetailID
FROM 
	Sales.SalesOrderHeader soh
INNER JOIN 
	Sales.SalesOrderDetail sod on sod.SalesOrderID = soh.SalesOrderID

But there are times where you want only 1 row for each SalesOrderID. And so we need a mechanism which can effectively concatenate all the SalesOrderDetailIDs for each SalesOrderID and return it as a single cell, adjacent to the corresponding SalesOrderID.

Consider the following query:

SELECT TOP 10
	soh.SalesOrderID
	,(SELECT CHAR(44) + CAST(sod.SalesOrderDetailID AS NVARCHAR(MAX))
			 FROM Sales.SalesOrderDetail sod
			 WHERE soh.SalesOrderID = sod.SalesOrderID
			 FOR XML PATH('row'), ROOT('result'),TYPE) AS SalesOrderDetailIDs
FROM Sales.SalesOrderHeader soh
ORDER BY soh.SalesOrderID

which yields the following resultset:

Figure 1


You can see that via the use of the XML query syntax and a correlated subquery, I have gone a long way to achieving our goal. All the SalesOrderDetailIDs are in a cell such that there is only 1 row for each SalesOrderID. But we are not interested in well-formed XML, having the XML type (which was done by using the TYPE directive and specifying a ROOT element). We need to peel back the XML elements and extract the value within. We can use the xQuery API to return an NVARCHAR(MAX), which has effectively returned the contents of each XML “row” element.

So now our query looks like this:

SELECT 
	soh.SalesOrderID
	,((SELECT CHAR(44) + CAST(sod.SalesOrderDetailID AS NVARCHAR(MAX))
			 FROM Sales.SalesOrderDetail sod
			 WHERE soh.SalesOrderID = sod.SalesOrderID
			 FOR XML PATH('row'),TYPE, ROOT('result')).value('/result[1]','varchar(max)')) AS SalesOrderDetailIDs
FROM Sales.SalesOrderHeader soh
ORDER BY soh.SalesOrderID

But we’re not quite there. We have 1, small, pesky comma at the beginning of the NVARCHAR, an example of which looks like this:

,1,2,3,4,5,6,7,8,9,10,11,12

This is where STUFF comes in. If we were to deal with this in isolation, it would look like this:

SELECT STUFF(',1,2,3,4,5,6,7,8,9,10,11,12', 1,1, N'')

We are telling STUFF to go to the first character, delete 1 character and replace it with an empty string. That’s how we remove the leading comma. Incorporating that into the main query, we have our final code:

SELECT 
	soh.SalesOrderID
	,(SELECT STUFF((SELECT CHAR(44) + CAST(sod.SalesOrderDetailID AS NVARCHAR(MAX))
			 FROM Sales.SalesOrderDetail sod
			 WHERE soh.SalesOrderID = sod.SalesOrderID
			 FOR XML PATH('row'),TYPE, ROOT('result')).value('/result[1]','varchar(max)'),1,1,N'')) AS SalesOrderDetailIDs
FROM Sales.SalesOrderHeader soh
ORDER BY soh.SalesOrderID

And our resultset:

Figure 2

Note: Rob Farley, in his post Handling special characters with FOR XML PATH, notes that you can:

  1. Remove the ROOT
  2. Change the xQuery to .value('.','varchar(max)')

Rob also makes the point that “it’s always felt strange to me to be returning something which I declare to be XML (using TYPE) without it being well-formed”. I agree with this. It’s a bit weird and makes it harder to see what is going on. But I wanted to point that out for completeness.

Otherwise, that’s a real world example usage of STUFF.

Stuff – The SQL Server Function

I want to dedicate a small post to STUFF. It’s a pretty handy little function in SQL Server which deletes a sequence of characters from a source string and then inserts another sequence of characters into the source string, starting at a specified position. The best way to show how it works is with examples.


Lets turn the word inimical into inimitable:

SELECT STUFF('inimical', 6, 3, 'table')

outputs: inimitable

STUFF is not 0-indexed. So, it goes to the 6th character in the string, deletes 3 characters and replaces that with the string in the last parameter. Now we have inimitable.


Add a colon into a time:

SELECT STUFF('1159', 3, 0, ':') 

outputs: 11:59


Mask a credit card number:

DECLARE @CreditCardNumber VARCHAR(20)
SET @CreditCardNumber = '9876-5432-0987-6543'

SELECT STUFF(@CreditCardNumber, 1, LEN(@CreditCardNumber) - 4, REPLICATE('X', LEN(@CreditCardNumber) - 4)) AS [Output]

outputs: XXXXXXXXXXXXXXX6543

So, STUFF is pretty cool. But there is something which I use it for quite regularly in my work. As it is quite involved, I’ll do a separate post to walk through it (this post is really just “setting the table” for that one).

FluentValidation and Emails with White Space

Quite often, users will copy and paste an email address into an email text input. And that email address will contain trailing white space. If that gets past your client-side validation code, you need to trim it at the server. That’s usually not a problem. But, if you are using the email validator in FluentValidation, such an address will fail validation and the message sent back to the user will be a generic one that the email address is not valid. It won’t mention a trailing space. And for most users, it can take quite a bit of time to figure out that the email address has some trailing whitespace. If our help-desk is anything to go by, many users didn’t figure that out at all and so it was down to me to address this.

After a lengthy discussion with our domain experts, we decided that we did not want an email address to be considered invalid, just because of a trailing space. Technically it is invalid. But it is easy enough to trim it in the Services layer. But we must first get past server-side validation. So, my existing validator, which was returning an error message about an invalid email address looked something like this:

RuleFor(m => m.NewEmail)
	.EmailAddress()	
	.WithMessage(ValidationConstants.SymbolIsNotAValidEmailAddress, x => x.NewEmail)
	.WithName("NewEmail"); 

As I mentioned above, the EmailAddress() validator will fail the validation of an email address with a trailing space. Before I move onto the custom validator I ended up writing, I just want to quickly note something we tried, but which turned out to be a bad idea. First, I tried the following:

RuleFor(m => m.NewEmail.Trim())
	.EmailAddress()
	.When(m => m.NewEmail != null)
	.WithMessage(ValidationConstants.SymbolIsNotAValidEmailAddress, x => x.NewEmail)
	.WithName("NewEmail"); 

Trimming the email property (in the 1st line) immediately felt wrong. We also ran into a few problems with that approach, which are beyond the scope of this post. But most importantly, mutating the property in the RuleFor lambda is not the way FluentValidation is meant to be used.

The right approach, the way we saw it, was to write a custom validator which would basically validate the email in the same way as the normal EmailAddress validator does, except permitting trailing whitespace. Step 1 in this process was to grab the source of the EmailAddress validator, which can be found here. Step 2 was to create the new validator itself, based on the EmailValidator referred to in Step 1, which I creatively named NonTrimmedEmailValidator:

    public class NonTrimmedEmailValidator : PropertyValidator, IRegularExpressionValidator, IEmailValidator
    {
        private readonly Regex regex;

        const string expression = @"^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-||_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+([a-z]+|\d|-|\.{0,1}|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])?([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))$";

        public NonTrimmedEmailValidator()
            : base(new LanguageStringSource("NonTrimmedEmailValidator"))
        {
            regex = new Regex(expression, RegexOptions.IgnoreCase);
        }


        protected override bool IsValid(PropertyValidatorContext context)
        {
            if (context.PropertyValue == null) return true;

            // only interested in whether the trimmed string is a valid email address.
            var trimmedEmail = ((string)context.PropertyValue).Trim();

            if (!regex.IsMatch(trimmedEmail))
            {
                return false;
            }

            return true;
        }

        public string Expression
        {
            get { return expression; }
        }
    }
}

Step 3 was to create an extension method to make it usable like the other fluent validators:

public static class FluentValidationExtensions
{
    public static IRuleBuilderOptions<T, TProperty> NonTrimmedStringIsValidEmailAddressWhenTrimmed<T, TProperty>(this IRuleBuilder<T, TProperty> ruleBuilder)
    {
        return ruleBuilder.SetValidator(new NonTrimmedEmailValidator());
    }
}

Now, we can validate objects which have properties that are email addresses, where the email has a trailing space.

RuleFor(m => m.NewEmail)
	.NonTrimmedStringIsValidEmailAddressWhenTrimmed()
	.WithMessage(ValidationConstants.SymbolIsNotAValidEmailAddress, x => x.NewEmail)
	.WithName("NewEmail");

And, like I am doing anyway, trim the string in the services layer. Now, the user does not get a validation failure when they copy and paste an email address from somewhere that happens to grab a bit of white space on the end.

Parsing a Comma Delimited List in SQL Server Scripts – Cleverly

There’s an external guy we use for really complex SQL. We have a report which has been the ruin of many a developer, and after many failures, it became apparent that developers could not do this. That’s when you need to get in a proper SQL merchant. Someone who thinks in terms of sets, rather than while loops.

But I digress. In one of his Scripts, I saw something which I found very interesting. He wrote a few short lines of code which parse a comma-delimited list (passed in as a parameter). Now, I’ve seen many a function which has been created for this, and they all seemed to fit about what I would do if I went to write it. But this guy… This is how he wrote it (inserting the result into a table variable called @HospitalIDTable):

		declare @HospitalIds CHAR (13) = '1,2,3,4,5,6,9' -- pretend this is a parameter
		declare			@XML as xml
		declare			@Delimiter as char(1) =','

		set				@XML = CAST(('<X>'+REPLACE(@HospitalIds ,@Delimiter ,'</X><X>')+'</X>') as xml)
		declare			@HospitalIDTable table (HospitalId int)
		insert into		@HospitalIDTable
		select			N.value('.', 'INT') AS HospitalId 
		from			@XML.nodes('X') AS T(N)
                -- done

If you output the @XML variable, you will see:

<X>1</X><X>2</X><X>3</X><X>4</X><X>5</X><X>6</X><X>9</X>

Now, the final SELECT statement – I don’t even understand. I know N is to make a string unicode, but value??? I need to research that. Back in a tick.

OK. So the value method is the XML data type method for XQuery. I should have known that, because I’ve done that before. Calling it off the N prefix is what threw me, and I still haven’t totally grokked what is going on there. But it is as cool as heck and if you do a SELECT on the @HospitalIDTable table variable, you get your list as a column of data.

Very concise and cool.

Experts like this guy make light work of complex SQL. We’ve had several dev’s look at his scripts and everyone agrees that their relationship with SQL is not on the same level as this guys. We’re front-end/middle-tiers devs. What did you expect?!

Why Cat?

Sometimes, when new-comers come to functional programming, they see some things which may seem strange to them. Things which don’t make sense from an object-oriented perspective. I’m going to take an example of this and show you what I mean. Lets say we take the following Javascript function cat, which takes multiple arrays and merges them all into one (I am using the Ramda library to assist. Anything with the R prefix is a Ramda function):

var cat = function () {
  var head = R.head(arguments);
  if (head)
    return head.concat.apply(head, R.tail(arguments));
  else
    return [];
};

The question may be posed, why bother with that function when Javascript has concat built in as a method on the Array prototype? The big difference is that concat is called on the object (the instance array), whereas cat operates on whatever is passed to it.:

// concat
[2, 1].concat([7, 6, 3, 4], [9, 6, 8]);
// returns [2, 1, 7, 6, 3, 4, 9, 6, 8]

// cat
cat([2, 1], [7, 6, 3, 4], [9, 6, 8])
// also returns [2, 1, 7, 6, 3, 4, 9, 6, 8]

It’s all about composability. Functional programming is a very “composable” paradigm. Functions (like cat) operate on whatever is passed to them as parameters. They do not operate on objects upon which they are methods. They are “free-standing” in nature. Lets see cat get composed as part of a bigger pipeline. Here, we have a function called splat. splat will basically flatten the arguments object of a function, similar to LINQ’s SelectMany:

var splat = function (fun) {
	return function(array) {
		return fun.apply(null, array);
	};
};

Now, lets say we want to use cat to merge some arrays. But the arrays are buried in the item of a larger array. Here’s how we could solve that:

var flatCat = splat(cat);
var arraysToMerge = [[7, 6, 3, 4], [2, 1],[77, 2, 45]]; // the arrays are in the 1st and only index of this larger array
var result = flatCat(arraysToMerge);
// result will be [7, 6, 3, 4, 2, 1, 77, 2, 45];

This works nicely with cat, but you cannot pass Array.prototype.concat into splat. It just won’t work. It’s object-focused nature is its undoing. Run it yourself with concat, instead of cat, to see it bomb out and you’ll understand why it failed.

So, whilst our first glance at cat seemed to size it up as a pointless refactoring to achieve the same thing as Array.prototype.concat, I hope I’ve demonstrated that by tweaking the concept into a function which stands alone (separate from any “owning” object), we can use it as a composable part in a larger operation. Just one cog in the functional pipeline.

Just remember, cat operates on whatever is passed to it.