SEO Friendly String Sanitizer

Tags: dot-net, seo, nuget

The usual situation when developing public facing web sites is the need to create SEO friendly URLs. Usually, it consists of a name of the entity being displayed on the particular page, and here is where we don't have decent support in .Net framework to create URL friendly version of such a string. Still, we can make our own solution.

The problem of converting an arbitrary string to a URL friendly representation consists of two parts. First is diacritical folding which converts all Unicode characters to its US-ASCII counterparts, and the second is removal of all punctuation and replacing whitespace with dashes.

When these two actions have been made, we get a string that fits nicely into the URL, conforms to the standards and looks eye-friendly.

The second part is pretty simple – going through the string character by character, skipping punctuation and replacing whitespaces with keeping track to avoid multiple dashes in a row. Nothing smart here, just "manual labor" (done by the computer, luckily).

As for the diacritical folding part, there is a discussion on MSDN blog about the issue. Although the accepted answer offers string.Normalize() method that .Net framework provides, it really doesn't produce the expected result.

It keeps diacritics (probably based on the machine's globalization settings) instead of converting them to their non-diacritic counterparts. We may argue that IIS serves URLs containing diacritics without problem, but it's not what I'm really looking for. I really like it to be US-ASCII to make sure it doesn't get messed-up when someone collects it.

A bit down the line in that discussion, there's work by Peter Ritchie (MVP) which handles the folding manually and does the job right.

So, as the framework doesn't provide something like this, and I haven't seen this work published in a convenient way to be reused, I've decided to pack it and make it available. I've collected Peter's work for diacritical folding, added a bit of my own code for string transformation, and compiled it into a small library to do the job.

The source is available on GitHub and the library is packaged and available as a NuGet package. I hope you'll find it helpful.

Story comments:

blog comments powered by Disqus