StringBuilder Performance vs. String Concatenation

  • Thread starter Thread starter Kevin C
  • Start date Start date
This was the same thing I had heard, and it was the basis for my statement
that I thought I was a perfect candidate for using StringBuilder in my code
in place of statements such as:

public static string GetID(...)
{
return contextZero.ID + ciDelimiter + dvZero.ID + ciDelimiter +
contextOne.ID + ciDelimiter + dvOne.ID + ciDelimiter +
contextTwo.ID + ciDelimiter + dvTwo.ID + ciDelimiter +
chDtls.ID + chDtls.IDExtension;
}

Each item in this statement is a string. It's returned in repsonse to a
request for an ID. There are literally hundreds, if not thousands of
statements like this in our code. And they are called LOTS of times --
anytime the ID of an object is required. They should be a critical
performance bottleneck because of this. We considered going with numeric IDs
and several other things, but strings have advantages for us. So I switch to
using StringBuilder.

To my amazement, I found that replacing these string concats with
StringBuilder equivalents actually reduced the measured performance of our
app. Anyone see a flaw or a reason that StringBuilder would falter in a
situation like this?

Dave
 
According to the article in CodeProject, there is a perfectly reasonable
explanation why StringBuilder works the way it does. The "buffer" is
contains is actually a mutable string. Yep, strings are not actually
immutable in .NET.

Why does this matter?

Well, you are actually building a string. Not just building a buffer and
then copying to a string. So in the common case:

1) You create the buffer.
2) You append.
3) You append.
4) You append.
....
297345) You append.
297346) You get the string you just created.

For large strings, this is a TREMENDOUS performance win for the common case,
especially if you make the buffer big enough to hold the entire string.
This eliminates not only a mult-megabyte copy operation, but also prevents
you from having two copies of the huge string in memory at the same time.
It also gets you out of some messy GC.

Of course, if you don't use StringBuilder exactly this way, there can be
some performance side effects.
 
{
return contextZero.ID + ciDelimiter + dvZero.ID + ciDelimiter +
contextOne.ID + ciDelimiter + dvOne.ID + ciDelimiter +
contextTwo.ID + ciDelimiter + dvTwo.ID + ciDelimiter +
chDtls.ID + chDtls.IDExtension;
}

Each item in this statement is a string.

Yeah, but that's not a lot of concatenations. That's one big
concatenation.

This is a lot of concatenations:

StringBuilder sb = contextZero.ID;
sb += ciDelimiter;
sb += dvZero.ID;
sb += ciDelimiter;
sb += contextOne.ID;
sb += ciDelimiter;
sb += dvOne.ID;
sb += ciDelimiter;
sb += contextTwo.ID;
sb += ciDelimiter;
sb += dvTwo.ID;
sb += ciDelimiter;
sb += chDtls.ID;
sb += chDtls.IDExtension;

-- Rick
 
Mountain Bikn' Guy said:
This was the same thing I had heard, and it was the basis for my statement
that I thought I was a perfect candidate for using StringBuilder in my code
in place of statements such as:

public static string GetID(...)
{
return contextZero.ID + ciDelimiter + dvZero.ID + ciDelimiter +
contextOne.ID + ciDelimiter + dvOne.ID + ciDelimiter +
contextTwo.ID + ciDelimiter + dvTwo.ID + ciDelimiter +
chDtls.ID + chDtls.IDExtension;
}

Ah no - a statement like that will already be concatenated pretty
efficiently.
Each item in this statement is a string. It's returned in repsonse to a
request for an ID. There are literally hundreds, if not thousands of
statements like this in our code. And they are called LOTS of times --
anytime the ID of an object is required. They should be a critical
performance bottleneck because of this. We considered going with numeric IDs
and several other things, but strings have advantages for us. So I switch to
using StringBuilder.

To my amazement, I found that replacing these string concats with
StringBuilder equivalents actually reduced the measured performance of our
app. Anyone see a flaw or a reason that StringBuilder would falter in a
situation like this?

Yes - the above is a single step concatenation; it doesn't produce a
lot of intermediate strings. The thing to avoid would be converting the
above into stuff like:

public static string GetID(...)
{
string ret = contextZero.ID;
ret += ciDelimiter;
ret += dvZero.ID;
// etc
}

or worse still, doing a lot of concatenations in a loop.
 
Jon Skeet said:
Ah no - a statement like that will already be concatenated pretty
efficiently.

Just to add my $.02: the above gets translated to a single
String.Concat(string[]) as oposed to a lot of String.Concat(string) calls.
That's why replacing it with StringBuilder wil actually make the code
slower.

Jerry
public static string GetID(...)
{
string ret = contextZero.ID;
ret += ciDelimiter;
ret += dvZero.ID;
// etc
}

or worse still, doing a lot of concatenations in a loop.

Loop is the key word here :) Especially one that runs couple hundred
times...

Jerry
 
There's been a lot of fruitful discussions on this subject, I'm not sure
anyone really hit the original question squarely and it deserves an answer.

This is actually a great thread because it shows why even the best minds (no
sarcasm implied here at all) are often fooled when it comes to perf.
Assumptions can easily be wrong, so it's vitally important to measure. I'm
rarely bold enough to predict even the most basic perf results because I
remember how many times I was burned. Certainly I would never make a
statement so bold as "Use Stringbuilders to concatenate" without having a
deeper understanding of the concatenation pattern. I tend to give advice
like "Many users find Stringbuilders useful for their concatenation pattens,
consider using them and measure to see if they help you". Wussy but safe :)

OK, now for some actual, no-kidding-around useful advice (I hope).

When stringbuilders are faster than regular strings it is because of string
allocations and copies that didn't have to happen. Imagine a string builder
that had a buffer that was always the perfect size for the string it held...
it'd be useless right because for sure the next string you appended wouldn't
fit and you'd have to make a new buffer (the new exact size). In fact,
such a stringbuilder wouldn't be very much different than just a string. So
having some slop on the end is important, it's because there's a little room
at the end that we can add things without having to copy.

Well, ok so how much slop? Suppose there was some slop but that the slop
wasn't usually enough to accomodate the appends that happen. Again we'd be
worse off than just strings (strings at least have no slop). There has to
be enough slop that its likely that many appends will fit within the slop,
otherwise there isn't much savings.

Let me break it into 4 cases:

Big string and small appends
=>It's substantially likely that appends will fit in the slop and so they're
fast, this is the best case(buffer size becomes double the string when it no
longer fits so on average the slop is half the current string length) (if
there are lots of small appends to a big string you win the most using
stringbuilder)

Big string and big appends:
=>While the string is comparable in size (or smaller) to the appends
stringbuilder won't save you much, if this continues to the point where the
appends are small compared to the accumlated string you're in the good case

Small string big appends:
=> bad case, string builder will just slow you down until enough slop has
built up to hold those appends, you move to "big string big appends" as you
append and finally to "big string small appends" if/when the buffer becomes
collossal

Small string, small appends:
=> could be ok if you had a good idea how big your string was going to get
and preallocated enough so that you have sufficient slop for the appends.
You might be able to do better if you just concated all the small appends
together in one operation.

One last tip: Sometimes you can get substantial savings by changing
currency in the right places in your algorithm

Pattern A:

StringBuilder sb = new StringBuilder(SuitableSize);
// sb gets a bunch of stuff
sb += GetMyObjectID(foo,bar); // GetMyObjectID makes a string

Pattern B:

StringBuilder sb = new StringBuilder(SuitableSize);
// sb gets a bunch of stuff
AppendMyObjectID(sb, foo,bar); // function puts the ID directly into
the buffer

PatternB has no temporary string for the return value, this *might* be
better depending on the nature of the ObjectID composition/calculation.
Something you'd want to measure.

Remember it's all about reducing memory traffic so the competitors are

-the memory in the stringbuilder, including the slop
-the temporary strings (if any) in your algorithm with and without
stringbuilders
-the final output string (note that getting the string out of a
stringbuilder doesn't cause a new alloc, the existing buffer is converted
into a string and then the stringbuilder is logically empty so you don't pay
this cost twice if you use stringbuilder. You do pay for the final output if
you don't use stringbuilder but then you didn't have to pay for the builder
up front)

It's very hard to say which is faster/smaller in general... it's all about
the usage pattern.
 
Rico,
Thanks for your informative post.
Dave

Rico Mariani said:
There's been a lot of fruitful discussions on this subject, I'm not sure
anyone really hit the original question squarely and it deserves an answer.

This is actually a great thread because it shows why even the best minds (no
sarcasm implied here at all) are often fooled when it comes to perf.
Assumptions can easily be wrong, so it's vitally important to measure. I'm
rarely bold enough to predict even the most basic perf results because I
remember how many times I was burned. Certainly I would never make a
statement so bold as "Use Stringbuilders to concatenate" without having a
deeper understanding of the concatenation pattern. I tend to give advice
like "Many users find Stringbuilders useful for their concatenation pattens,
consider using them and measure to see if they help you". Wussy but safe :)

OK, now for some actual, no-kidding-around useful advice (I hope).

When stringbuilders are faster than regular strings it is because of string
allocations and copies that didn't have to happen. Imagine a string builder
that had a buffer that was always the perfect size for the string it held...
it'd be useless right because for sure the next string you appended wouldn't
fit and you'd have to make a new buffer (the new exact size). In fact,
such a stringbuilder wouldn't be very much different than just a string. So
having some slop on the end is important, it's because there's a little room
at the end that we can add things without having to copy.

Well, ok so how much slop? Suppose there was some slop but that the slop
wasn't usually enough to accomodate the appends that happen. Again we'd be
worse off than just strings (strings at least have no slop). There has to
be enough slop that its likely that many appends will fit within the slop,
otherwise there isn't much savings.

Let me break it into 4 cases:

Big string and small appends
=>It's substantially likely that appends will fit in the slop and so they're
fast, this is the best case(buffer size becomes double the string when it no
longer fits so on average the slop is half the current string length) (if
there are lots of small appends to a big string you win the most using
stringbuilder)

Big string and big appends:
=>While the string is comparable in size (or smaller) to the appends
stringbuilder won't save you much, if this continues to the point where the
appends are small compared to the accumlated string you're in the good case

Small string big appends:
=> bad case, string builder will just slow you down until enough slop has
built up to hold those appends, you move to "big string big appends" as you
append and finally to "big string small appends" if/when the buffer becomes
collossal

Small string, small appends:
=> could be ok if you had a good idea how big your string was going to get
and preallocated enough so that you have sufficient slop for the appends.
You might be able to do better if you just concated all the small appends
together in one operation.

One last tip: Sometimes you can get substantial savings by changing
currency in the right places in your algorithm

Pattern A:

StringBuilder sb = new StringBuilder(SuitableSize);
// sb gets a bunch of stuff
sb += GetMyObjectID(foo,bar); // GetMyObjectID makes a string

Pattern B:

StringBuilder sb = new StringBuilder(SuitableSize);
// sb gets a bunch of stuff
AppendMyObjectID(sb, foo,bar); // function puts the ID directly into
the buffer

PatternB has no temporary string for the return value, this *might* be
better depending on the nature of the ObjectID composition/calculation.
Something you'd want to measure.

Remember it's all about reducing memory traffic so the competitors are

-the memory in the stringbuilder, including the slop
-the temporary strings (if any) in your algorithm with and without
stringbuilders
-the final output string (note that getting the string out of a
stringbuilder doesn't cause a new alloc, the existing buffer is converted
into a string and then the stringbuilder is logically empty so you don't pay
this cost twice if you use stringbuilder. You do pay for the final output if
you don't use stringbuilder but then you didn't have to pay for the builder
up front)

It's very hard to say which is faster/smaller in general... it's all about
the usage pattern.

--
This posting is provided "AS IS" with no warranties, and confers no rights.

Rico Mariani
CLR Performance Architect
 
In several of these cases (big loops, lots of concatinations), another
possible methodology would be to store the strings to concantinated in an
array (string array if count of elements known, arraylist if not) then use
string.join to perform the final concatination. This avoids a large
majority of the interim memory allocations, and you just store the
references until the final string data is needed.

Ray Beckett
 
Very nice suggestion!

Ray Beckett said:
In several of these cases (big loops, lots of concatinations), another
possible methodology would be to store the strings to concantinated in an
array (string array if count of elements known, arraylist if not) then use
string.join to perform the final concatination. This avoids a large
majority of the interim memory allocations, and you just store the
references until the final string data is needed.

Ray Beckett

safe it output
 
Does anyone know what the default allocation size is for StringBuilder if you do
not explicitly initialize the instance?
 
This doesn't actually compile does it? I tried += with the a StringBuilder
object and the compiler gave an error such as:

Cannot convert type 'string' to 'System.Text.StringBuilder'
 
Hi Kerry,

Paste this into a VB Form and put a breakpoint on the Append.
Add sb to a Watch window and see what happens to Capacity and
Length as you do each Append. Most illuminating.

Dim sb as New StringBuilder
Dim I As Integer
For I = 1 To 20
sb.Append ("123456789012345")
Next

And that's why, when you have a large StringBuilder, you should set
the capacity to, or just above, the most that you'll want.

Regards,
Fergus
 
Kerry Sanders said:
Does anyone know what the default allocation size is for StringBuilder if you do
not explicitly initialize the instance?

Well, just a very simple test (creating a new StringBuilder and showing
its Capacity) shows it as 16 if you don't specify any string, or the
lowest power of 2 which is greater than the length of the initial
string if you *do* specify a string.
 
Hi Kerry,

Using Reflector (look on Google for Lutz Reflector) to decompile the source
is a good way to get the answer to questions like this. An alternative is
to look at the Rotor source code.

There's a definite performance benefit in getting the size of the buffer
right before you start using it. StringBuilder, like most of things in .NET
that allocate dynamically, allocate by *2 each time. So, StringBuilder goes
16 bytes, 32 bytes, 64 bytes, and so on. This gets a bit scary when you
have 1MB, because as soon as you add that next byte it goes direct to 2MB.

In most cases, the allocation approach isn't a problem, but it's worth
remembering that it has this behaviour in case you start using it in
situations where optimization by guestimating the size of the buffer by the
time you're finished may help you. (Think about generating large strings,
such as XML documents.)
 
Using Reflector (look on Google for Lutz Reflector) to decompile the source
is a good way to get the answer to questions like this. An alternative is
to look at the Rotor source code.

I actually found, downloaded, and looked at Reflector last night after posting
that message. For some reason, a method in one of my classes is causing
Reflector to fail to open my assembly. I do not know why, however. The error I
received is as follows

Method GetColumns in type InventoryList from assembly Inventory,
Version=2.2.0.0, Culture=neutral, PublicKeyToken=null does not have an
implementation


Any thoughts on this? The assembly compiles with no problems. I do not
understand the meaning behind this particular error.
 
Kerry Sanders said:
I actually found, downloaded, and looked at Reflector last night after posting
that message. For some reason, a method in one of my classes is causing
Reflector to fail to open my assembly. I do not know why, however. The error I
received is as follows

Method GetColumns in type InventoryList from assembly Inventory,
Version=2.2.0.0, Culture=neutral, PublicKeyToken=null does not have an
implementation


Any thoughts on this? The assembly compiles with no problems. I do not
understand the meaning behind this particular error.

I think it's almost certainly a bug in Reflector - if you can, I think
it would be worth mailing the author, including your assembly and a
description of the problem.
 
I think it's almost certainly a bug in Reflector - if you can, I think
it would be worth mailing the author, including your assembly and a
description of the problem.


Thanks Jon. I will do that.
 
This doesn't actually compile does it? I tried += with the a StringBuilder
object and the compiler gave an error such as:
....

Sorry, got carried away. S/B sb.Append(ciDelimiter), etc.

-- Rick
 
Back
Top