String and StringBuilder

  • Thread starter Thread starter Sreenivas
  • Start date Start date
S

Sreenivas

I am just peeping through BlogEngine.NET source code and got a doubt.
In Util.cs file ,there is a function called “RemoveIllegalCharacters
“ ,which removes spaces and diacritics and illegal chars in title of a
post such as ! @ : etc.. And replaces them with string.Empty.
function is given below.

public static string RemoveIllegalCharacters(string text)
{
if (string.IsNullOrEmpty(text))
return text;

text = text.Replace(":", string.Empty);
text = text.Replace("/", string.Empty);
text = text.Replace("?",
string.Empty);
return text;
}

My doubt is why don’t they use StringBuilder in place of string object
and stop polluting CIL with ‘ldstr ‘object?
Here, Is string a deliberate choice to StringBuilder or am I wrong?
Any insights!
TIA,
Sreenivas
 
Sreenivas said:
I am just peeping through BlogEngine.NET source code and got a doubt.
In Util.cs file ,there is a function called “RemoveIllegalCharacters
“ ,which removes spaces and diacritics and illegal chars in title of a
post such as ! @ : etc.. And replaces them with string.Empty.
function is given below.

public static string RemoveIllegalCharacters(string text)
{
if (string.IsNullOrEmpty(text))
return text;

text = text.Replace(":", string.Empty);
text = text.Replace("/", string.Empty);
text = text.Replace("?",
string.Empty);
return text;
}

My doubt is why don’t they use StringBuilder in place of string object
and stop polluting CIL with ‘ldstr ‘object?
Here, Is string a deliberate choice to StringBuilder or am I wrong?
Any insights!
TIA,
Sreenivas

If you use a StringBuilder, it's capacity won't change when you remove
characters. That means that when you finally get the string from the
StringBuilder, it will have unused bytes at the end of it (unless the
string has shrunk to less than half the original size, in which case it
will be copied into a new string). Using strings, the final result will
always use exactly as little memory as required.

Using strings like this means that it creates intermediate strings, but
they are very short lived so the garbage collector can efficiently
remove them. Short lived objects are cheap, it's the long lived objects
that causes more work for the garbage collector.

Also, if you bench mark this method compared to a StringBuilder version,
it's likely that this version performs slightly better. In any case, you
will definitely not get any considerable improvement using a StringBuilder.
 
Sreenivas said:
I am just peeping through BlogEngine.NET source code and got a doubt.
In Util.cs file ,there is a function called “RemoveIllegalCharacters
“ ,which removes spaces and diacritics and illegal chars in title of a
post such as ! @ : etc.. And replaces them with string.Empty.
function is given below.

public static string RemoveIllegalCharacters(string text)
{
if (string.IsNullOrEmpty(text))
return text;

text = text.Replace(":", string.Empty);
text = text.Replace("/", string.Empty);
text = text.Replace("?",
string.Empty);
return text;
}

My doubt is why don’t they use StringBuilder in place of string object
and stop polluting CIL with ‘ldstr ‘object?
Here, Is string a deliberate choice to StringBuilder or am I wrong?
Any insights!
TIA,
Sreenivas

If you use a StringBuilder, it's capacity won't change when you remove
characters. That means that when you finally get the string from the
StringBuilder, it will have unused bytes at the end of it (unless the
string has shrunk to less than half the original size, in which case it
will be copied into a new string). Using strings, the final result will
always use exactly as little memory as required.

Using strings like this means that it creates intermediate strings, but
they are very short lived so the garbage collector can efficiently
remove them. Short lived objects are cheap, it's the long lived objects
that causes more work for the garbage collector.

Also, if you bench mark this method compared to a StringBuilder version,
it's likely that this version performs slightly better. In any case, you
will definitely not get any considerable improvement using a StringBuilder.
 
If you use a StringBuilder, it's capacity won't change when you remove
characters. That means that when you finally get the string from the
StringBuilder, it will have unused bytes at the end of it (unless the
string has shrunk to less than half the original size, in which case it
will be copied into a new string). Using strings, the final result will
always use exactly as little memory as required.

Using strings like this means that it creates intermediate strings, but
they are very short lived so the garbage collector can efficiently
remove them. Short lived objects are cheap, it's the long lived objects
that causes more work for the garbage collector.

Also, if you bench mark this method compared to a StringBuilder version,
it's likely that this version performs slightly better. In any case, you
will definitely not get any considerable improvement using a StringBuilder.

Thanks for your replies!
Points taken.
 
If you use a StringBuilder, it's capacity won't change when you remove
characters. That means that when you finally get the string from the
StringBuilder, it will have unused bytes at the end of it (unless the
string has shrunk to less than half the original size, in which case it
will be copied into a new string). Using strings, the final result will
always use exactly as little memory as required.

Using strings like this means that it creates intermediate strings, but
they are very short lived so the garbage collector can efficiently
remove them. Short lived objects are cheap, it's the long lived objects
that causes more work for the garbage collector.

Also, if you bench mark this method compared to a StringBuilder version,
it's likely that this version performs slightly better. In any case, you
will definitely not get any considerable improvement using a StringBuilder.

Thanks for your replies!
Points taken.
 
Just as you rightly pointed out. StringBuilder would have been much better
choice due to the immutability of the string object.
 
Just as you rightly pointed out. StringBuilder would have been much better
choice due to the immutability of the string object.
 
It's true that you may not any much considerable improvement but mind that
loosing a few microseconds here and there impact a larger application
performance and perceived usability.
 
It's true that you may not any much considerable improvement but mind that
loosing a few microseconds here and there impact a larger application
performance and perceived usability.
 
Dapo said:
It's true that you may not any much considerable improvement but mind
that loosing a few microseconds here and there impact a larger
application performance and perceived usability.

Then you should actually test which method gives better performance.
However, the performance difference will be quite small, so it's really
micro-optimising.

My experience from similar code, is that it's likely that the simple
string operations give slightly better performance than using a
StringBuilder. You don't really use the power of the StringBuilder, so
you don't get anything back from the overhead of creating it.

For more demanding tasks, like concatenating a lot of strings, the
StringBuilder is of course far superior.
 
Dapo said:
It's true that you may not any much considerable improvement but mind
that loosing a few microseconds here and there impact a larger
application performance and perceived usability.

Then you should actually test which method gives better performance.
However, the performance difference will be quite small, so it's really
micro-optimising.

My experience from similar code, is that it's likely that the simple
string operations give slightly better performance than using a
StringBuilder. You don't really use the power of the StringBuilder, so
you don't get anything back from the overhead of creating it.

For more demanding tasks, like concatenating a lot of strings, the
StringBuilder is of course far superior.
 
In colloquial English, the use of the word "polluting" would normally
imply that there's something harmful about the practice. So, in your
view, what is harmful about the "ldstr" instruction?

Also in English (colloquial or not) the proper word is "question," not
"doubt." I have noticed that this error is mainly made by people from India.
 
In colloquial English, the use of the word "polluting" would normally
imply that there's something harmful about the practice. So, in your
view, what is harmful about the "ldstr" instruction?

Also in English (colloquial or not) the proper word is "question," not
"doubt." I have noticed that this error is mainly made by people from India.
 
When you pass String into StringBuilder's constructor, it makes a copy
of the string into a Char array. When you call StringBuilder.ToString
(), it makes a copy of the Char array into a String object. The
String.Replace method makes a copy only if any characters were
replaced. Therefore, you would need to be doing at least four replaces
where we know that a character will be replaced before there is any
performance benefit in pushing it to a StringBuilder for the replace
calls.
 
When you pass String into StringBuilder's constructor, it makes a copy
of the string into a Char array. When you call StringBuilder.ToString
(), it makes a copy of the Char array into a String object. The
String.Replace method makes a copy only if any characters were
replaced. Therefore, you would need to be doing at least four replaces
where we know that a character will be replaced before there is any
performance benefit in pushing it to a StringBuilder for the replace
calls.
 
What is a string wrapper? Is that a reference to the string that stringbuilder has manipulated?

When you pass String into StringBuilder's constructor, it makes a copy
of the string into a Char array. When you call StringBuilder.ToString
(), it makes a copy of the Char array into a String object. [...]

The second part's not quite true. StringBuilder.ToString() returns a
String wrapper for its existing buffer; it should only perform a copy if
the StringBuilder instance is further mutated.

I agree with everything else you wrote, but IMHO the ultimate conclusion
is that the performance differences between the two approaches isn't so
cut-and-dried. I wouldn't make any specific claims of performance one way
or the other without measuring the actual performance under realistic
conditions; in fact, that's the main reason for not focusing on
performance at that level of detail until you know you have a performance
problem. It's too easy for intuition to be wrong.

Pete
 
What is a string wrapper? Is that a reference to the string that stringbuilder has manipulated?

When you pass String into StringBuilder's constructor, it makes a copy
of the string into a Char array. When you call StringBuilder.ToString
(), it makes a copy of the Char array into a String object. [...]

The second part's not quite true. StringBuilder.ToString() returns a
String wrapper for its existing buffer; it should only perform a copy if
the StringBuilder instance is further mutated.

I agree with everything else you wrote, but IMHO the ultimate conclusion
is that the performance differences between the two approaches isn't so
cut-and-dried. I wouldn't make any specific claims of performance one way
or the other without measuring the actual performance under realistic
conditions; in fact, that's the main reason for not focusing on
performance at that level of detail until you know you have a performance
problem. It's too easy for intuition to be wrong.

Pete
 
Jon said:
What is a string wrapper? Is that a reference to the string that stringbuilder has manipulated?

It's actually a regular String object.

The StringBuilder internally treats it as a mutable string buffer while
working on it, but when you get it from the ToString method it's just a
regular string.

The StringBuilder keeps track of the status of the string, so that if
you have used ToString to get the buffer as a string and keep making
changes the StringBuilder, it can no longer change the current buffer
(as it's used as an immutable string outside the StringBuilder), so it
has to copy the data to a new buffer.
 
Jon said:
What is a string wrapper? Is that a reference to the string that stringbuilder has manipulated?

It's actually a regular String object.

The StringBuilder internally treats it as a mutable string buffer while
working on it, but when you get it from the ToString method it's just a
regular string.

The StringBuilder keeps track of the status of the string, so that if
you have used ToString to get the buffer as a string and keep making
changes the StringBuilder, it can no longer change the current buffer
(as it's used as an immutable string outside the StringBuilder), so it
has to copy the data to a new buffer.
 
Thanks for the explanation. I have a string in some code that is used to display messages in a text
box, and I keep appending new messages to it. It sounds like I should be using stringbuilder.

Jon


Göran Andersson said:
What is a string wrapper? Is that a reference to the string that stringbuilder has manipulated?

It's actually a regular String object.

The StringBuilder internally treats it as a mutable string buffer while
working on it, but when you get it from the ToString method it's just a
regular string.

The StringBuilder keeps track of the status of the string, so that if
you have used ToString to get the buffer as a string and keep making
changes the StringBuilder, it can no longer change the current buffer
(as it's used as an immutable string outside the StringBuilder), so it
has to copy the data to a new buffer.
 
Back
Top