String constructor returning interned string?

  • Thread starter Thread starter Jon Skeet
  • Start date Start date
J

Jon Skeet

I've just noticed something rather odd and disturbing. The following
code displays "True":

using System;

class Test
{
public static void Main(string[] args)
{
string x = new string ("".ToCharArray());
string y = new string ("".ToCharArray());
Console.WriteLine (object.ReferenceEquals (x, y));
}
}

In other words, new string(...) is *not* returning a new string
reference.

This worries me - not so much for the specific example, but for the
precedent set. What other new ... expressions might return non-new
references? This could have significant implications in multi-
threading, where you may rely on two references being different for
locking purposes.
 
Jon Skeet said:
I've just noticed something rather odd and disturbing. The following
code displays "True":

using System;

class Test
{
public static void Main(string[] args)
{
string x = new string ("".ToCharArray());
string y = new string ("".ToCharArray());
Console.WriteLine (object.ReferenceEquals (x, y));
}
}

In other words, new string(...) is *not* returning a new string
reference.

This worries me - not so much for the specific example, but for the
precedent set. What other new ... expressions might return non-new
references? This could have significant implications in multi-
threading, where you may rely on two references being different for
locking purposes.

This is odd...quite honestly. However, have you checked to make sure the JIT
isn't making some kind of very clever optimzation here? Perhaps realizing
your creating two strings from the same source(the char array of an already
interned string), and not creating the new object but instead setting both x
& y to the same reference?
Otherwise this is a quite disturbing find, indeed.

 
Daniel O'Connell said:
This is odd...quite honestly. However, have you checked to make sure the JIT
isn't making some kind of very clever optimzation here? Perhaps realizing
your creating two strings from the same source(the char array of an already
interned string), and not creating the new object but instead setting both x
& y to the same reference?

It only happens with the empty string, as far as I can see.
Otherwise this is a quite disturbing find, indeed.

I think it's disturbing either way - basically, if you rely on the new
operator always returning a previously unknown reference, you've got
problems.

However, I've looked at the String(char[]) docs, and the remarks say:

<quote>
If value is a null reference (Nothing in Visual Basic) or contains no
element, an Empty instance is initialized.
</quote>

I suspect if I hadn't known what that meant beforehand (due to seeing
this) I wouldn't have understood it.

I'd have thought this would actually take more work, and that there
wouldn't really be that much benefit in it. I just wonder where else
this might be lurking...
 
There's more on .NET's string interning over on Chris Brumme's blog:

http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/7943b9be-cca9-41e1-8a83-3d7a0dbba270

He even has some example code snippets to illustrate how it can bite you
when calling into unmanaged code.

/kel

Jon said:
Daniel O'Connell said:
This is odd...quite honestly. However, have you checked to make sure the JIT
isn't making some kind of very clever optimzation here? Perhaps realizing
your creating two strings from the same source(the char array of an already
interned string), and not creating the new object but instead setting both x
& y to the same reference?


It only happens with the empty string, as far as I can see.

Otherwise this is a quite disturbing find, indeed.


I think it's disturbing either way - basically, if you rely on the new
operator always returning a previously unknown reference, you've got
problems.

However, I've looked at the String(char[]) docs, and the remarks say:

<quote>
If value is a null reference (Nothing in Visual Basic) or contains no
element, an Empty instance is initialized.
</quote>

I suspect if I hadn't known what that meant beforehand (due to seeing
this) I wouldn't have understood it.

I'd have thought this would actually take more work, and that there
wouldn't really be that much benefit in it. I just wonder where else
this might be lurking...
 
Hi, Jon,
Null and Empty string are the only cases.
This is to avoid too many instances of empty strings in CLR.
But I agree we probably should always create new strings in new String(...)

Gang Peng
[MS]

Jon Skeet said:
Daniel O'Connell said:
This is odd...quite honestly. However, have you checked to make sure the JIT
isn't making some kind of very clever optimzation here? Perhaps realizing
your creating two strings from the same source(the char array of an already
interned string), and not creating the new object but instead setting both x
& y to the same reference?

It only happens with the empty string, as far as I can see.
Otherwise this is a quite disturbing find, indeed.

I think it's disturbing either way - basically, if you rely on the new
operator always returning a previously unknown reference, you've got
problems.

However, I've looked at the String(char[]) docs, and the remarks say:

<quote>
If value is a null reference (Nothing in Visual Basic) or contains no
element, an Empty instance is initialized.
</quote>

I suspect if I hadn't known what that meant beforehand (due to seeing
this) I wouldn't have understood it.

I'd have thought this would actually take more work, and that there
wouldn't really be that much benefit in it. I just wonder where else
this might be lurking...
 
Back
Top