String.Join vs. StringBuilder, which is faster?

  • Thread starter Thread starter Bob
  • Start date Start date
B

Bob

I have a function that takes in a list of IDs (hundreds) as input parameter
and needs to pass the data to another step as a comma delimited string. The
source can easily create this list of IDs in a comma-delimited string or
string array. I don't want it to be a string because I want to overload
this function, and it's sister already uses a string input parameter. Now
if I define the function to take in a string array, it solves my overload
issue, but then I have to convert the array inside the function to a comma
delimited string using string.Join(). Alternatively, I can define the input
parameter as a StringBuilder (which just contains the comma delimited
string), and then do a sb.ToString() to get the string. Which would be a
better solution between using string array and then join vs. StringBuilder
and ToString?

I know if I don't overload at all, it would make the best sense from a
performance perspective, but code readability and maintenance become harder
as the two functions really do the very similar things.

Thanks
Bob
 
Since strings are immutable (a new memory address is allocated each time
your alter the string), the StringBuilder is the way to go.
 
What does the fact that strings are immutable have to do with Bob's
question? String.Join is a static method of that class and creates a new
string object when called. Personally I would go with an array and Join, it
will be easier to understand (you're passing an array or a list of values)
and you won't have to loop to put all your values into a string builder.
That will never be faster than calling Join. Unless of course your input is
not an array of strings (you're saying a list of IDs, not sure about the
type there).

Jerry
 
String.Join is implemented internally by the run-time. It is serviced by the
ConcatenateJoinHelperArray method which turns out to be extremely fast at
creating the final string as opposed to the slightly slower
StringBuilder.Append. If you have an array of strings, I would recommend using
that with String.Join over StringBuilder.Append since String.Join is going to be
quite a bit faster.

using System;
using System.Text;

public class JoinVsBuilder {
private static string[] strings = new string[0];

private static void Main(string[] args) {
int count = int.Parse(args[0]);

strings = new string[count];
for(int i = 0; i < count; i++) {
strings = i.ToString();
}

DateTime start, end;

start = DateTime.Now;
string newStr = string.Join("foo", strings);
end = DateTime.Now;
Console.WriteLine("String::Join timing is {0}", end-start);

StringBuilder sb = new StringBuilder();
start = DateTime.Now;

// Faster than worrying about when to append the connector
sb.Append(strings[0]);
for(int i = 1; i < strings.Length; i++) {
sb.Append("foo");
sb.Append(strings);
}
string newStr2 = sb.ToString();
end = DateTime.Now;
Console.WriteLine("StringBuilder::Append timing is {0}", end-start);
}
}

C:\Projects\CSharp\Samples\JoinVsBuilder>JoinVsBuilder.exe 1000000
String::Join timing is 00:00:00.4606624
StringBuilder::Append timing is 00:00:02.9141904
 
Jerry III said:
What does the fact that strings are immutable have to do with Bob's
question? String.Join is a static method of that class and creates a new
string object when called. Personally I would go with an array and Join, it
will be easier to understand (you're passing an array or a list of values)
and you won't have to loop to put all your values into a string builder.
That will never be faster than calling Join. Unless of course your input is
not an array of strings (you're saying a list of IDs, not sure about the
type there).


Because, as you said "String.Join is a static method of that class and
creates a new string object when called.". Each time its called, it will
create a new string, and thus a new spot on the heap. A StringBuilder will
only need one memory address and for this reason, usually performs better.
 
Scott M. said:
Because, as you said "String.Join is a static method of that class and
creates a new string object when called.". Each time its called, it will
create a new string, and thus a new spot on the heap. A StringBuilder will
only need one memory address and for this reason, usually performs better.

No, because here only one call to String.Join is needed. A
StringBuilder is better than manually creating lots of new strings, but
that doesn't happen in String.Join.
 
Yes, this is a perfect example where StringBuilder at the very best provides
no advantage, and most likely is at a strong disadvantage. Just because
Microsoft and others have preached that it should always be used whenever
there's any kind of concatenation, don't believe it.
 
I recreated your test, tweaking it so that there is no connector (thus only
one Append in the loop). I consistently get Join being 2 to 2-1/2 times
faster than a series of SB Appends.
 
Daniel Billingsley said:
Yes, this is a perfect example where StringBuilder at the very best provides
no advantage, and most likely is at a strong disadvantage. Just because
Microsoft and others have preached that it should always be used whenever
there's any kind of concatenation, don't believe it.

I don't think they *have* actually preached that - people who've
understood *part* of why it's worth using StringBuilder but not all of
it have preached it. I don't think MS put particularly blanket
recommendations out. Let me know where they are and I'll complain about
them, if they do exist :)
 
These issues with stringbuilder performance have a strange ring of
familiartiy eh skeet?
 
Figured I'd post up my latest testing. The StringBuilder really isn't as bad as
the original test showed.
What is missing is that behind the scenes String::Join is computing an optimally
sized memory array,
while the StringBuilder is constantly expanding it's own. If you use capacity
planning then the
StringBuilder is only about 25% slower than the String::Join. In addition there
are some string interning
problems that tend to affect performance quite a bit that I've been able to
compute out of my analysis.

Original performance testing post:
http://weblogs.asp.net/justin_rogers/archive/2004/03/04/84306.aspx

Revised performance testing post:
http://weblogs.asp.net/justin_rogers/archive/2004/03/05/84986.aspx

I definitely understand why Microsoft recommends the use of StringBuilder. Most
developers tend to
subscribe to a form of lazy concatenation rather than preparing their data for a
more specialized method.
Many algorithms also favor lazy concatenation rather than preparing data up
front (ASP .NET is probably
the number one subscribe since their entire system is based on lazy
concatenation). If you do some capacity
planning or limit the number of times the builder will be resized you can get
much better performance from
the StringBuilder than I'm sure most people are getting because they don't take
performance very seriously
or don't know what it takes to squeeze performance out of the StringBuilder.

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

Alvin Bruney said:
These issues with stringbuilder performance have a strange ring of
familiartiy eh skeet?
 
Justin Rogers said:
In addition there
are some string interning problems that tend to affect performance quite a bit
that I've been able to compute out of my analysis.

I don't understand this point. In the code that I've seen, there isn't
any interning going on apart from the interning "foo" which only
happens once. Nothing should be interning each of the bits which ends
up being joined.

Far more likely, IMO, is that you've got garbage collection occurring -
but that's an entirely different thing from string interning.
 
All strings are present in a global string table. This is how you ensure
immutability.
To compact the string table and prevent instances where the same string is
present
in memory more than once they do a scan to make sure the string doesn't already
exist (using a string hash I believe). If it does then your string reference
simply points
to the already allocated string table slot. Since we are building the same
exact string
many times, the second and any subsequent time we build the same string we incur
a performance hit as the string table is searched. Since the strings in the
example
are very large it takes a while to do the comparison and find the string (aka
computing
the hash).

There is no garbage collection in the example if you run it using the parameters
I
pointed out in the article. At least not with a sufficient amount of memory.
The string
allocations themselves only take up say:

3 * 1million + 7 * 1million, or approximately 10 million characters.

If the sample is changed to ensure that the strings are referenced throughout
the operation
of the program and then used at the end of the program, it makes no changes to
the
performance characteristics. No GC is happening.
 
Justin,
You just described Interning, however as Jon stated Interning is not
involved here!

Try the following:

char c1 = 'a', c2 = 'b';

string s1 = "ab", s2 = string.Concat(c1, c2);

System.Diagnostics.Debug.WriteLine(string.Equals(s1, s2), "s1 == s2");
System.Diagnostics.Debug.WriteLine(string.ReferenceEquals(s1, s2), "s1
is s2");

string.Equals is true as they contain the same characters, however
string.ReferenceEquals is false as they were not interned!

Hope this helps
Jay

Justin Rogers said:
All strings are present in a global string table. This is how you ensure
immutability.
To compact the string table and prevent instances where the same string is
present
in memory more than once they do a scan to make sure the string doesn't already
exist (using a string hash I believe). If it does then your string reference
simply points
to the already allocated string table slot. Since we are building the same
exact string
many times, the second and any subsequent time we build the same string we incur
a performance hit as the string table is searched. Since the strings in the
example
are very large it takes a while to do the comparison and find the string (aka
computing
the hash).

There is no garbage collection in the example if you run it using the parameters
I
pointed out in the article. At least not with a sufficient amount of memory.
The string
allocations themselves only take up say:

3 * 1million + 7 * 1million, or approximately 10 million characters.

If the sample is changed to ensure that the strings are referenced throughout
the operation
of the program and then used at the end of the program, it makes no changes to
the
performance characteristics. No GC is happening.


--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

quite a
bit
 
Justin Rogers said:
All strings are present in a global string table. This is how you ensure
immutability.

No they're not. Only strings which are interned do this. It happens
automatically for strings literals, and you can ask other strings to be
interned. It doesn't happen automatically for *all* strings.

There is no garbage collection in the example if you run it using the parameters
I pointed out in the article.

Yes there is.
At least not with a sufficient amount of memory.
The string allocations themselves only take up say:

3 * 1million + 7 * 1million, or approximately 10 million characters.

And that's far more than the size of generation 0 in the heap - so
garbage collection *will* take place. You can see this with the
performance monitor if you want.
If the sample is changed to ensure that the strings are referenced throughout
the operation of the program and then used at the end of the program, it makes no
changes to the performance characteristics. No GC is happening.

Well there certainly isn't any interning happening...
 
Yes Jon. My prior testing with concatenation comparisons showed a pretty
direct relationship to the overall performance and the amount of GC for each
option. That's kind of a "duh" thing in a way since we know the options
vary greatly in the numbers of objects created and discarded.
 
Maybe you're right, but I think I remember hearing it a lot at MSDN cafes
and other Microsoft-sponsored events in the early days. I guess it's the
"and others" part of my statement that still seems to ring true.

Call it one of my personal pet topics as by reference vs. by value is to
you. ;).
 
Using StringBuilder is still a valid way to add strings, you should use it
whenever you use the + operator on strings more than few times (like in a
loop).

This case doesn't do that, as it's a choice between a looped StringBuilder
calls and a single String.Join call. I think we all agreed what's faster in
this specific case.

Jerry

Daniel Billingsley said:
Maybe you're right, but I think I remember hearing it a lot at MSDN cafes
and other Microsoft-sponsored events in the early days. I guess it's the
"and others" part of my statement that still seems to ring true.

Call it one of my personal pet topics as by reference vs. by value is to
you. ;).
 
Jerry Pisk said:
Using StringBuilder is still a valid way to add strings, you should use it
whenever you use the + operator on strings more than few times (like in a
loop).

When you're not using the intermediate values, yes. No-one's saying
that StringBuilder isn't useful - just that it's not as universally
good as some people believe...
This case doesn't do that, as it's a choice between a looped StringBuilder
calls and a single String.Join call. I think we all agreed what's faster in
this specific case.

Yes, although at his last post to the thread so far, Justin was
slightly confused as to exactly what was going on, IMO.
 
Back
Top