Was it a right decision to make strings immutable?

  • Thread starter Thread starter InDepth
  • Start date Start date
I

InDepth

Now that .NET is at it's fourth release (3.5 is coming soon), my very humble
question to the gurus is:

"What have we won with the decision to have string objects immutable? Or did
we won?"

Ok. It's a broad, and maybe a very silly question to ask, but still.
I mean, what good has it brought to us? What advantages immutable strings
have against mutable ones?

I'see some negative things, like special cases in VM, more difficult code
writing (use StringBuilder for some, string for some), GC bloating with
string objects to be collected (it still takes some time to get rid of 3000
string objects, even when the VM/GC has specialized handling for them), need
for unsafe keyword for very fast string handling (and in some cases the only
one), duplicating string objects even when the resulting string is for
reading only. Reallocations go up and so does the heap fragmentation (even
with HFH). Ok, mutable strings do also reallocs but maybe a little less
(statistically).

Am I just not thinking hard enough to see the obvious (?) benefits that the
designers did see (and they do understand a whole a lot more than me)?

I've read a very few research papers and documents about immutable strings
from many universities, R&D labs, but still I'm not convinced. A copy on
change string would be the best, but hardly doable.
But immutable?

Please, can anyone sheer some light here?

Thanks

MFX
 
First of all, if you are counting 1.1 and 3.5 as releases, wouldn't 3.5 be
the fifth release (1.0, 1.1, 2.0, 3.0, 3.5)?

I'd say performance of memory management is the reason for immutable
strings. If they weren't immutable, every time a string changed, memory would
have to be rearranged to allocate the new size of the string and addresses
updated. Very ugly. With immutable strings, if a string is changed, the
string is just placed in a new location in memory and old location is
released. No other memory need be affected. This is much cleaner.
 
InDepth said:
Now that .NET is at it's fourth release (3.5 is coming soon), my very humble
question to the gurus is:

"What have we won with the decision to have string objects immutable? Or did
we won?"

We have won:
1) We don't need to worry about whether something else is going to
modify the string. This is the biggest win by far.
2) Thread safety is very straightforward for immutable types.
3) Because the size of the string is never going to change, it's just a
single object, rather than containing a reference to another object
(with more overhead).
 
InDepth said:
Now that .NET is at it's fourth release (3.5 is coming soon), my very
humble question to the gurus is:

"What have we won with the decision to have string objects immutable? Or
did we won?"

I think it's been a clear win in every respect.

I've worked in environments where I had to grab locks around every single
string access - read or write. Not having to do that in .Net has made code
much simpler.

I wish they had gone one step further, and made it easy for developers to
create their own immutable classes.
 
$s[0]= 'W'
$s[1]= 'I'
$s[2]='N'

--
William Stacey [C# MVP]
PowerLocker, PowerPad
www.powerlocker.com



| Now that .NET is at it's fourth release (3.5 is coming soon), my very
humble
| question to the gurus is:
|
| "What have we won with the decision to have string objects immutable? Or
did
| we won?"
|
| Ok. It's a broad, and maybe a very silly question to ask, but still.
| I mean, what good has it brought to us? What advantages immutable strings
| have against mutable ones?
|
| I'see some negative things, like special cases in VM, more difficult code
| writing (use StringBuilder for some, string for some), GC bloating with
| string objects to be collected (it still takes some time to get rid of
3000
| string objects, even when the VM/GC has specialized handling for them),
need
| for unsafe keyword for very fast string handling (and in some cases the
only
| one), duplicating string objects even when the resulting string is for
| reading only. Reallocations go up and so does the heap fragmentation (even
| with HFH). Ok, mutable strings do also reallocs but maybe a little less
| (statistically).
|
| Am I just not thinking hard enough to see the obvious (?) benefits that
the
| designers did see (and they do understand a whole a lot more than me)?
|
| I've read a very few research papers and documents about immutable strings
| from many universities, R&D labs, but still I'm not convinced. A copy on
| change string would be the best, but hardly doable.
| But immutable?
|
| Please, can anyone sheer some light here?
|
| Thanks
|
| MFX
|
 
Hi Jim,

thanks for your comments. See inline mine.

Jim Anderson said:
First of all, if you are counting 1.1 and 3.5 as releases, wouldn't 3.5 be
the fifth release (1.0, 1.1, 2.0, 3.0, 3.5)?
Well, I have a tendency to not consider .NET 1.0 a release but more like a
beta. :-)
I'd say performance of memory management is the reason for immutable
strings.
If they weren't immutable, every time a string changed, memory would
have to be rearranged to allocate the new size of the string and addresses
updated. Very ugly.

Not every time, on the contrary. If I have a mutable string "HELLO WORLD!!"
and cut out the last !! (string.left()), the string behaves like
StringBuilder; no reallocs/allocs at all.
With immutable string, at one time, I would have both HELLO WORLD!! and
HELLO WORLD in the memory (for a very short time, granted, but still).
With immutable strings, if a string is changed, the
string is just placed in a new location in memory and old location is
released. No other memory need be affected. This is much cleaner.

Well, that happens with mutable strings too, but only if the new length >
current length.
It will happen every time with immutable strings. I'd say the performance
would be better
with mutable strings since at least some reallocs would not happen at all.
 
The main point re performance is that if the string is mutable, then
any secure code must either lock the string in some way, or take a
local copy, before working on it (even if just reading). Otherwise
there are some very fun things that threaded code can do to stomp all
over eachother. Most of the time, code is just handing an existing
string around - so immutable code needs to do nothing to be very
performant; but if it could change while you were looking at it? This
means you simply can't trust the string as a primative-like object,
and need to treat it as you might a collection. Massive overhead.

Marc
 
Hi Jon!

Thanks for comments.. please see inline.

Jon Skeet said:
We have won:
1) We don't need to worry about whether something else is going to
modify the string. This is the biggest win by far.

Unless you have some unsafe functions you don't know you are using (maybe in
some component somewhere), like:
static unsafe void ModifyString(string YourImmutableString)
{
fixed (char* ptr = YourImmutableString)
{
*ptr = 'A'; // WasYourImmutableString
}

}

Ok. This is not pure .NET like, but even BCL uses heavly unsafe string
handling.

I haven't seen that the mutability has never been a big problem in any
lanugage, C/C++ (no real "strings" here, but add STL in it), smalltalk,
delphi,ada and others. How did it come a "problem" for .NET?

2) Thread safety is very straightforward for immutable types.

That is a advantage. A good one. But does it outweight the disadvantages?
Talking only about immutable strings, not immutable types in general.
3) Because the size of the string is never going to change, it's just a
single object, rather than containing a reference to another object
(with more overhead).
True, but you get more overhead from the management/book-keeping for the
dead strings.
You now have a lots of instances of string objects and the GC/VM must still
manage (mark, free, collect, ..) them.
I think it will had overhead. In the real world, the strings are (almost)
never immutable, IMHO.
 
Hi William,


William Stacey said:
$s[0]= 'W'
$s[1]= 'I'
$s[2]='N'
fixed(char* s)
{
*(s+1)='H';
*(s+2)='Y';
}
? :-)
--
William Stacey [C# MVP]
PowerLocker, PowerPad
www.powerlocker.com



| Now that .NET is at it's fourth release (3.5 is coming soon), my very
humble
| question to the gurus is:
|
| "What have we won with the decision to have string objects immutable? Or
did
| we won?"
|
| Ok. It's a broad, and maybe a very silly question to ask, but still.
| I mean, what good has it brought to us? What advantages immutable
strings
| have against mutable ones?
|
| I'see some negative things, like special cases in VM, more difficult
code
| writing (use StringBuilder for some, string for some), GC bloating with
| string objects to be collected (it still takes some time to get rid of
3000
| string objects, even when the VM/GC has specialized handling for them),
need
| for unsafe keyword for very fast string handling (and in some cases the
only
| one), duplicating string objects even when the resulting string is for
| reading only. Reallocations go up and so does the heap fragmentation
(even
| with HFH). Ok, mutable strings do also reallocs but maybe a little less
| (statistically).
|
| Am I just not thinking hard enough to see the obvious (?) benefits that
the
| designers did see (and they do understand a whole a lot more than me)?
|
| I've read a very few research papers and documents about immutable
strings
| from many universities, R&D labs, but still I'm not convinced. A copy on
| change string would be the best, but hardly doable.
| But immutable?
|
| Please, can anyone sheer some light here?
|
| Thanks
|
| MFX
|
 
Hello Chris,

Chris Mullins said:
I think it's been a clear win in every respect.
Sorry, not to me.. at least yet. I'm hoping to understand the benefits.
I've worked in environments where I had to grab locks around every single
string access - read or write. Not having to do that in .Net has made code
much simpler.
To access any shared resource you need locks, unless const or read only.
If a class has a string that can be r/w from many threads, the access must
be synchronized, immutable or not.
Sure, life w/o pointers make things easier but the principle is still there?
I wish they had gone one step further, and made it easy for developers to
create their own immutable classes.
That would be a big win.

 
Marc Gravell said:
The main point re performance is that if the string is mutable, then any
secure code must either lock the string in some way, or take a local copy,
before working on it (even if just reading). Otherwise there are some very
fun things that threaded code can do to stomp all over eachother. Most of
the time, code is just handing an existing string around - so immutable
code needs to do nothing to be very performant; but if it could change
while you were looking at it? This means you simply can't trust the string
as a primative-like object, and need to treat it as you might a
collection. Massive overhead.

Marc
Hmm.. so immutable string helps here because when a thread reads a string
from a shared instance (via property for example), it really has a "local
copy" (in a way at least) of it as if another thread modifies the same
property the first threads copy of the string does not change since the
shared instance of the string is now antother. Did I get it right?
The first thread continues thus dealing with "old" data and the new thread
with "new" data. No hassle in that.. in a way.
 
That is pretty much what I am saying, yes.

An equally (perhaps more-so) example is property behavior; if I can
change a string internally, how can a consuming class either validate
the new value, or react ("observer" notifications, side-effects, etc)
to changes?

Marc
 
InDepth said:
Unless you have some unsafe functions you don't know you are using (maybe in
some component somewhere), like:
static unsafe void ModifyString(string YourImmutableString)
{
fixed (char* ptr = YourImmutableString)
{
*ptr = 'A'; // WasYourImmutableString
}

}

Ok. This is not pure .NET like, but even BCL uses heavly unsafe string
handling.

I trust the BCL not to abuse strings. For other code, I can easily make
sure they don't use any unsafe code.
I haven't seen that the mutability has never been a big problem in any
lanugage, C/C++ (no real "strings" here, but add STL in it), smalltalk,
delphi,ada and others. How did it come a "problem" for .NET?

STL strings end up being copied all over the place, or you pass
pointers to avoid the copying, and end up needing to know where it's
safe to pass pointers and where it's not. Ugly.
That is a advantage. A good one. But does it outweight the disadvantages?
Talking only about immutable strings, not immutable types in general.

Yes, I believe it does - massively. Strings are used a huge amount - do
you really want to obtain a lock (either in your code or in the string
type itself) every time you want to use one?
True, but you get more overhead from the management/book-keeping for the
dead strings.

Not a lot, unless you're doing large amounts of manipulation on strings
without using a StringBuilder.
You now have a lots of instances of string objects and the GC/VM must still
manage (mark, free, collect, ..) them.
I think it will had overhead. In the real world, the strings are (almost)
never immutable, IMHO.

Well, the last statement doesn't really make sense - did you mean that
in real programs you almost always change the value of a string
variable to a string which is a changed version of another string?

If so, I disagree - a large proportion of the strings I use never have
any operations like substring performed on them. I often create one
string from a lot of others, but in that case I wouldn't want to change
the original one anyway.
 
To access any shared resource you need locks, unless const or read only.
If a class has a string that can be r/w from many threads, the access must
be synchronized, immutable or not.

No - consider this situation: a string is passed into a thread at the
start of its life (eg as a ParameterisedThreadStart parameter). With
immutable strings, I don't need to check anything - I know that it will
always have the same contents, for the whole of the thread's life.

For mutable strings, I'd need to obtain a lock every time I read from
it, just in case another thread was trying to change it. I also
wouldn't be able to trust any validation I'd performed, as the string
could become "invalid" (by my validation criteria) at any point.
 
Jon Skeet said:
I trust the BCL not to abuse strings. For other code, I can easily make
sure they don't use any unsafe code.
How do you ensure that your code does not call anything unsafe?
For example, if you use a 3rd party library that might or might not use
unsafe code,
the ony way I know is to use CASPOL, but it's not *that* easy.
STL strings end up being copied all over the place, or you pass
pointers to avoid the copying, and end up needing to know where it's
safe to pass pointers and where it's not. Ugly.
Ugly, to some extent, but efficient.
Yes, I believe it does - massively. Strings are used a huge amount - do
you really want to obtain a lock (either in your code or in the string
type itself) every time you want to use one?
There are many ways to avoid that level extreme locking - I don't think that
windows kernel,
a heavyweight multithreaded beast, uses string level locking :-)
Not a lot, unless you're doing large amounts of manipulation on strings
without using a StringBuilder.
I've profiled a few simple/complex ASP.NET app, even one simple page can
create hundreds of string objects.
Multiply that with 50 users and more complex pages and all the rest. It's a
lot in my opinion.
Well, the last statement doesn't really make sense - did you mean that
in real programs you almost always change the value of a string
variable to a string which is a changed version of another string?

If so, I disagree - a large proportion of the strings I use never have
any operations like substring performed on them. I often create one
string from a lot of others, but in that case I wouldn't want to change
the original one anyway.
We are doing different kind of apps :-D.. seriously, even a simple parser or
serializer needs substrings and other string processing functions.

But I'm getting your point (finally, you might say). But still, maybe it
would have been better to make string as mutable by default and create
StringReadOnly when you need it.
But that's just an opinion.

Thanks Jon!
 
InDepth said:
How do you ensure that your code does not call anything unsafe?
For example, if you use a 3rd party library that might or might not use
unsafe code, the ony way I know is to use CASPOL, but it's not *that* easy.

It's reasonably easy - easy enough, if you have a suspicion.

Frankly, if a third party component starts mutating strings, it's
likely to get a bad reputation in a real hurry.
Ugly, to some extent, but efficient.

The copying all over the place isn't efficient in terms of performance,
and having to know where it's safe to pass pointers makes life hard for
the developer. Doesn't sound like a good idea to me.
There are many ways to avoid that level extreme locking - I don't think that
windows kernel,
a heavyweight multithreaded beast, uses string level locking :-)

If there are many ways to avoid it, how would you do so? In particular,
how would you do so in a way which wouldn't make this loop potentially
break:

foreach (char c in myString)
{
// Use c
}

or

for (int i=0; i < s.Length; i++)
{
// Use s
}

Both of these break with mutable strings, so you'd need to be able to
lock the contents of the string for the whole of enumeration. If your
loop potentially takes a long time, you could end up blocking another
thread just by doing something as simple as iterating over a string -
doesn't sound like a good thing to me.
I've profiled a few simple/complex ASP.NET app, even one simple page can
create hundreds of string objects.

I dare say - but have you counted how many of those would have been
saved with mutable strings?
Multiply that with 50 users and more complex pages and all the rest. It's a
lot in my opinion.

Again, without showing how mutable strings would have reduced the
number of strings involved, there's no evidence that any overhead would
have been saved.
We are doing different kind of apps :-D.. seriously, even a simple parser or
serializer needs substrings and other string processing functions.

Yes, but a lot of other apps *don't* need to take substrings.
But I'm getting your point (finally, you might say). But still, maybe it
would have been better to make string as mutable by default and create
StringReadOnly when you need it.
But that's just an opinion.

So you'd rather make it *hard* to use safely by default? Again, sounds
like a bad idea. If you need a mutable version of a string,
StringBuilder will help you while you're changing it. If you need to
pass it to a method requiring a string, of course, you'll need to call
ToString - but that's fair enough, as the method may well be assuming
that nothing else is going to change it.
 
Not having to worry about whether a string is getting changed by another
thread or callback is more than sufficient for me to agree with making
strings immutable. As for the issue of lots of strings being created and
discarded, this is why the .NET GC is a multi-generational GC. The vast
majority of strings will never make it out of Gen(0).

Mike Ober.
 
Back
Top