StringBuilder Performance vs. String Concatenation

  • Thread starter Thread starter Kevin C
  • Start date Start date
K

Kevin C

Quick Question:

StringBuilder is obviously more efficient dealing with string concatenations
than the old '+=' method... however, in dealing with relatively large string
concatenations (ie, 20-30k), what are the performance differences (if any
with something as trivial as this) between initializing a new instance of
StringBuilder with a specified capacity vs. initializing a new instance
without... (the final length is not fixed)

ie,

Performance differences between:

StringBuilder sb1 = new StringBuilder(30000);

and

StringBuilder sb1 = new StringBuilder();

Does this make a huge difference compared to '+='?

Cheers,
-k
 
IF you are doing large string concatenations, definitely use the
StringBuilder. There's no magic number per se, but on trivial
concatentions, it's not a big deal. The main thing to remember is that
strings are immutable, so 50 concatenations creates 50 string objects.

If you initialize the SB, you are better off if you go over the default size
because it doesn't have to reallocate space, but this only comes into play
if you exceed the default size. The performance differences really only are
noticed between the two if you exceed the boundaries, so it's hard to say in
absolute terms, it depends on the situation. If possible, try to initialize
the capacity, but even if you don't, you'll be much better off than +=.

HTH,

Bill
 
bill,

thanks for the quick reply -- that makes sense.

As a footnote though, which has the most negative (theorical?) effect on
performance:

1) over initializing an instance, ie., setting capacity at 30,000 characters
when you only need 20,000
or
2) under initialzing an instance, ie., setting capacity at 10,000 characters
and having the StringBuilder class dynamically allocate more room for the
additional 10,000 characters when you try to append 20,000.

-k
 
As a footnote though, which has the most negative (theorical?) effect
on performance:

1) over initializing an instance, ie., setting capacity at 30,000
characters when you only need 20,000
or
2) under initialzing an instance, ie., setting capacity at 10,000
characters and having the StringBuilder class dynamically allocate
more room for the additional 10,000 characters when you try to append
20,000.

I would say that number 2 has the most negative impact. When you append
characters that exceed the capacity of the StringBuilder, it must allocate
memory large enough to hold the new string, copy the existing characters to
it and then add the new characters. Whereas on number, the allocation is
already do and it just has to append the new data.

Chris
 
Chris, StringBuilder does not keep the string data in a single continuous
block of memory. What you described (allocating new chunk of memory, copying
old data into it and appending the new data) is exactly how += for strings
work. StringBuilder does not do that, it just allocates new memory to hold
the new data and keeps both the old and the new, but not together. Of course
calling ToString is going to finally add all the small chunks into one
piece, so it only pays off if you do a lot (I've seen posts that roughly 5
was the magic number) of appending.

Jerry
 
Jerry III said:
Chris, StringBuilder does not keep the string data in a single continuous
block of memory. What you described (allocating new chunk of memory, copying
old data into it and appending the new data) is exactly how += for strings
work. StringBuilder does not do that, it just allocates new memory to hold
the new data and keeps both the old and the new, but not together. Of course
calling ToString is going to finally add all the small chunks into one
piece, so it only pays off if you do a lot (I've seen posts that roughly 5
was the magic number) of appending.

Do you have any evidence of this? This is certainly the first I've
heard of it. As far as I'm aware, StringBuilder has a buffer, and once
that is full, the buffer is copied and resized. That's the view that
the rotor source suggests, too.
 
Are you sure about that? A quick look at the StringBuilder class with
Anakrino and ildasm seems to show a new allocation and copy of the old
buffer.
 
Hi Folks,

00004 // Copyright (c) 2002 Microsoft Corporation. All rights reserved.
00005 //
00017 ** Class: StringBuilder
00020 **
00021 ** Purpose: A prototype implementation of the StringBuilder
00022 ** class.
00023 **
00024 ** Date: December 8, 1997
00025 ** Last Updated: March 31, 1998
00026 **
00028 namespace System.Text {
00029 using System.Text;
00030 using System.Runtime.Serialization;
00031 using System;
00032 using System.Runtime.CompilerServices;
00033
00052 [Serializable()] public sealed class StringBuilder {

Full details at:
http://dotnet.di.unipi.it/Content/sscli/docs/doxygen/fx/bcl/stringbuilder_8cs-
source.html

Regards,
Fergus
 
Ok, I was not right, I guess I should decompile before I post something. I'm
really disappointed, I thought the StringBuilder was a lot more efficient
than this, I can just allocate large enough String and get a lot better
performance (memory is pretty cheap).

Jerry
 
Jerry III said:
Ok, I was not right, I guess I should decompile before I post something. I'm
really disappointed, I thought the StringBuilder was a lot more efficient
than this, I can just allocate large enough String and get a lot better
performance (memory is pretty cheap).

What do you mean by "just allocate large enough String"?

Could you give an example of concatenating many strings together in a
loop and allowing parts of the concatenation to be removed or replaced
where your code gives better performance than StringBuilder?
 
Hi Jerry,

You're right - and shouldn't be disappointed - StringBuilder <is> a lot
more efficient (time-wise, at least). Because it doubles when it needs to
grow, the number of 'grows' is pretty minimal. Of course, if you have a 10MB
string to which you want to add a single character, it'll grab another 20MB to
do it!!

The trouble with allocating a huge string is that as soon as you do
anything with it, you'll get a new totally different string - with that
massive one left for the GC. Ouch! You can't insert <into> a string, but
that's exactly what the StringBuilder is designed for.

Regards,
Fergus

ps. Jon is a need-for-truth man. Getting things wrong and having JS correct
you happens to me too, lol - and then I know more than I did. So, too, does
anyone else who had the same misconceptions. :-)
 
Fergus Cooney said:
ps. Jon is a need-for-truth man. Getting things wrong and having JS correct
you happens to me too, lol - and then I know more than I did. So, too, does
anyone else who had the same misconceptions. :-)

Fortunately it also happens to me too. I'd far rather post my beliefs
and have them thoroughly disproved (as has happened several times) than
shut up and have people believe that I know what's going on when I
don't.

Basically, what I'm trying to say is that being wrong is something that
happens to absolutely everyone, and that I mean no disrespect when I
correct/question someone.
 
I was talking about concatenating, I have yet to come across code that uses
StringBuilder (or StringBuffer in Java) to replace or remove parts of the
string. In those cases using existing code (those two classes) will be
efficient, but if you only do concatenating you are still far better off
making a guess of how long will your string be and preallocating the memory
yourself - i.e. telling StringBuilder how much it should allocate in the
constructor.

Personally I think it might be worth giving up some speed in ToString and
removing/replacing in order to make appending a lot faster. And if you're
going to ask - no, I don't have any code that proves any of this :(

Jerry
 
Jerry III said:
I was talking about concatenating, I have yet to come across code that uses
StringBuilder (or StringBuffer in Java) to replace or remove parts of the
string.

Ah - I have, although I agree that the most common case is just
appending.
In those cases using existing code (those two classes) will be
efficient, but if you only do concatenating you are still far better off
making a guess of how long will your string be and preallocating the memory
yourself - i.e. telling StringBuilder how much it should allocate in the
constructor.

Personally I think it might be worth giving up some speed in ToString and
removing/replacing in order to make appending a lot faster. And if you're
going to ask - no, I don't have any code that proves any of this :(

If you're never going to remove/replace/insert, you probably could
indeed improve the performance. Having said that, a quick attempt to do
so in an obvious way failed. One of the advantages of StringBuilder is
that if you don't need to expand the string in the end, the result of
ToString is just *there* with no further effort.

I might investigate this further though - see if I can come up with
something which beats StringBuilder in the simple case.

I suspect that StringBuilder is rarely the bottleneck in apps, however
- whereas simple repeated string concatenation in a loop easily could
be.
 
I'll throw in my 2-cents worth. After reading about the efficiency of
StringBuilder I changed a lot of my code in order to eliminate the large
number of string concats. However, when I ran the profiler and other timing
tests I found that my code was actually slower with StringBuilder. I didn't
investigate further (because the simple solution was to go back to strings),
but my experience indicates that there are definitely situations where the
overhead of StringBuilder is greater than the efficiencies. BTW, I thought I
was a perfect candidate for using StringBuilder because my code does a LOT
of string concats. Take that for what it's worth.
Dave
 
Mountain Bikn' Guy said:
I'll throw in my 2-cents worth. After reading about the efficiency of
StringBuilder I changed a lot of my code in order to eliminate the large
number of string concats. However, when I ran the profiler and other timing
tests I found that my code was actually slower with StringBuilder. I didn't
investigate further (because the simple solution was to go back to strings),
but my experience indicates that there are definitely situations where the
overhead of StringBuilder is greater than the efficiencies. BTW, I thought I
was a perfect candidate for using StringBuilder because my code does a LOT
of string concats. Take that for what it's worth.

Unfortunately, without more code, it's not worth a lot :(

If you *are* doing a lot of string concatenations, and you don't need
the results as strings between operations, it really *should* have been
quicker using StringBuilder.

If you ever get a separable bit of code which is demonstrating that
behaviour, I'd be interested to see it.
 
I think the issue here is doing a lot of concatenations as opposed to doing
a lot of concatenations into one resulting string. Replacing a single string
add with a string builder will definitely make your app slower, no matter
how many times you actually add strings in your app, the advantage of
StringBuilder shows up when you're adding a lot of strings into one result
you use at the end.

Jerry
 
Hi Jon,

I've seen your posts and know a bit about your mind. I've seen your site
and know a bit about your heart. 'Tis good for I respect you in both ways. ;-)

Regards,
Fergus
 
Jerry III said:
I think the issue here is doing a lot of concatenations as opposed to doing
a lot of concatenations into one resulting string. Replacing a single string
add with a string builder will definitely make your app slower, no matter
how many times you actually add strings in your app, the advantage of
StringBuilder shows up when you're adding a lot of strings into one result
you use at the end.

Yup, that's absolutely right. Note that I think the threshold for doing
this in Java may be much lower or even non-existent, as all string
concatenations in Java use StringBuffer anyway.
 
Back
Top