Saturday, October 11, 2014

String Concatenation

One of the things beginner .NET developers should have learned is that it’s not a good idea to add strings (e.g. s = s + “text”)  in .NET languages to concatenate them. The reason for this is that .NET string are immutable, meaning they cannot be changed once created. Because of that, each time you add strings it’s creates a new string object which replaces the existing string, thus leaving an object that needs to be disposed of. For the occasional concatenation this isn’t a big problem, but if you are doing this in a loop you could end up creating a lot of unnecessary objects. The solution to this problem is to use the System.Text.StringBuilder object which let’s you concatenate strings without creating extra objects.
In this article I want to take a closer look at string concatenation to understand what is actually happening behind the scenes. Let’s start with this block of code

string s1 = "one";
string s2 = "two";
string s3 = s1 + s2;

If we compile this code and then de-compile it  (in this case I am using Telerik’s JustDecompile) here is the resulting code:

string s1 = "one";
string s2 = "two";
string s3 = string.Concat(s1, s2);

You will notice that the addition of strings was replaced by the compiler with a call to the string.Concat function. Let’s take a look at the code for Concat. I have stripped out some of the code from the function just to show the important part.
int length = str0.Length;
string str = string.FastAllocateString(length + str1.Length);
string.FillStringChecked(str, 0, str0);
string.FillStringChecked(str, length, str1);
return str;

This code uses FastAllocateString to create a new string object that is big enough to hold both of the strings being concatenated. It then uses FillStringChecked to place the two strings in the correct spots in the new string. Finally it returns the new string. Here is where you can see the new object being created which replaces the old object that was in s3 in our original code.

There are versions of the String.Concat function that take up to four strings. So if you were to do s5 = s1+s2+s3+s4, this would call String.Concat(s1,s2,s3,s4). What happens after four strings? For example this code:

string s1 = "one";
string s2 = "two";
string s3 = "three";
string s4 = "four";
string s5 = "five";
string s6 = s1 + s2 +s3 +s4 +s5;

compiles to:
string s1 = "one";
string s2 = "two";
string s3 = "three";
string s4 = "four";
string s5 = "five";
string[] strArrays = new string[5];
strArrays[0] = s1;
strArrays[1] = s2;
strArrays[2] = s3;
strArrays[3] = s4;
strArrays[4] = s5;
string s6 = string.Concat(strArrays);

This version of Concat will scan through the array adding up the total length of all the strings, create a new string this length and then load each string into the correct location. This does create one extra object, the array, but it doesn’t create a new object for each concatenation.

One other thing to note about string concatenations. If for some reason you did something like this, maybe for purposes of code clarity:

string s1 = "one" + "two" + "three" + "four";
Console.WriteLine(s1);

the resulting code will look like this:

string s1 = "onetwothreefour";
Console.WriteLine(s1);

The compiler is smart enough to know that you are adding together 4 string literals so it automatically creates a single literal for you.
In conclusion, concatenation isn’t always bad in .NET programs. As long as all the concatenation happens in one line you don’t end up with a lot of extra objects. But if you are concatenating to a single string over multiple lines or within a loop, it’s a good idea to use a StringBuilder instead.

1 comment:

Anonymous said...

Dan, thanks for sharing this great tip!