Bug in Encoder.GetString()?

  • Thread starter Thread starter Chris Peacock
  • Start date Start date
C

Chris Peacock

I've found similar postings on various newsgroups suggesting that
other people have encountered this problem, but none of the postings
have replies. I hope therefore that the following may be of help.

My application is an ASP.NET one which calls
System.Text.UTF8Encoding.GetString on a byte sequence, and then
subsequently (well, a little later) attempts to call a Web Service
method. The call to the Web Service method fails with the exception
"hexadecimal value 0x00, is an invalid character", stack trace:-

[XmlException: '', hexadecimal value 0x00, is an invalid character.
Line 6, position 223.]
System.Xml.XmlScanner.ScanHexEntity()
System.Xml.XmlTextReader.ParseBeginTagExpandCharEntities() +1036
System.Xml.XmlTextReader.Read() +216
System.Xml.XmlReader.ReadElementString()
System.Web.Services.Protocols.SoapHttpClientProtocol.ReadSoapException(XmlReader
reader)
System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage
message, WebResponse response, Stream responseStream, Boolean
asyncCall)
System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String
methodName, Object[] parameters)
MyWeb.MyWebData.MyClass.MyFunction(MyDataClass& data, Byte[]
byteSomeData) in c:\inetpub\wwwroot\MyWeb\Web
References\MyWebData\Reference.vb:108
MyWeb.MyHttpHandler.ProcessRequest(HttpContext context) in
c:\inetpub\wwwroot\MyWeb\MyHttpHandler.vb:57
System.Web.CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
System.Web.HttpApplication.ExecuteStep(IExecutionStep step,
Boolean& completedSynchronously) +87


I traced the problem to the call to GetString:-

strResult = textEncoding.GetString (byteText)

....and fixed the problem by replacing this with code as follows:-

decoder = textEncoding.GetDecoder()
iBytesToDecode = decoder.GetCharCount(byteText, 0,
byteText.GetLength(0))
chars = Array.CreateInstance(GetType(Char), iBytesToDecode)
decoder.GetChars(byteText, 0, iBytesToDecode, chars, 0)
strResult = New String(chars)

I am using .NET framework version 1.1.4322.573 and ASP.NET version
1.1.4322.573.

Hopefully this may be of help to someone.
 
Chris Peacock said:
I've found similar postings on various newsgroups suggesting that
other people have encountered this problem, but none of the postings
have replies. I hope therefore that the following may be of help.

Could you give an example of what's in byteText? Also, if you could
give the code which gets the byte array to start with, that might
help...
 
Jon Skeet said:
Could you give an example of what's in byteText? Also, if you could
give the code which gets the byte array to start with, that might
help...

I'm sorry, but I made a blunder with my posting (which I subsequently
deleted). The cause of the problem was in fact that 'byteText'
contained an empty string, and when Encoder.GetString was called on
it, it produced a string filled with characters with value zero,
rather than a string of zero length (could this be a bug?). This
string was subsequently passed to the WebService, and caused the
error.

The workaround I've used for this is to do a Trim(Chr(0)) on the value
returned by GetString.

The other 'fix' I posted was incorrect, I'd inadvertantly changed the
value in 'byteText' at the same time as using this code, preventing
the error from occurring (duh!).
 
Chris Peacock said:
I'm sorry, but I made a blunder with my posting (which I subsequently
deleted). The cause of the problem was in fact that 'byteText'
contained an empty string

Hang on - byteText is an array of bytes, not a string.
and when Encoder.GetString was called on
it, it produced a string filled with characters with value zero,
rather than a string of zero length (could this be a bug?). This
string was subsequently passed to the WebService, and caused the
error.

I suspect byteText was *not* actually empty, but contained however many
bytes with value 0 as you then saw in the string.

How exactly were you creating byteText in the first place?
The workaround I've used for this is to do a Trim(Chr(0)) on the value
returned by GetString.

While that's a workaround, I would strongly suggest you try to make
sure you give Encoding.GetString the right data to start with instead.
 
Hang on - byteText is an array of bytes, not a string.

Sorry, what I meant was that byteText represents a string, which
happened to be empty - so byteText did in fact contain an array of
zeros.
I suspect byteText was *not* actually empty, but contained however many
bytes with value 0 as you then saw in the string.

Yes, that's correct.
How exactly were you creating byteText in the first place?

byteText is returned from CryptoStream.Read, i.e. it represents a
decrypted string.
While that's a workaround, I would strongly suggest you try to make
sure you give Encoding.GetString the right data to start with instead.

That's difficult; how do I check the length of a string represented
within a byte array? I know what you might suggest, which would be not
to encrypt empty strings at all (which happen to be passwords to be
passed as URL query strings) but instead to store them as just empty
byte arrays - but I feel that this is less secure, as it lets the
world know that the user does not have a password (this does happen,
by the way).

Surely Encoding.GetString should treat the first zero it encounters as
a terminating character? The Length property of the resulting string
even counts the zeros as valid characters! There must be a bug there.
 
Chris Peacock said:
Sorry, what I meant was that byteText represents a string, which
happened to be empty - so byteText did in fact contain an array of
zeros.

That's *not* a representation of an empty string. That's a
representation of a string containing null characters. That's a
different matter entirely.
Yes, that's correct.

In which case, Encoding.GetString is doing exactly the right thing.
byteText is returned from CryptoStream.Read, i.e. it represents a
decrypted string.

Again, how *exactly* did you read the data? Did you take note of the
return value of Read?
That's difficult; how do I check the length of a string represented
within a byte array?

You only generate data for valid characters.
I know what you might suggest, which would be not
to encrypt empty strings at all (which happen to be passwords to be
passed as URL query strings) but instead to store them as just empty
byte arrays - but I feel that this is less secure, as it lets the
world know that the user does not have a password (this does happen,
by the way).

Well, salting all passwords would help on that front.
Surely Encoding.GetString should treat the first zero it encounters as
a terminating character?

Why? A Unicode 0 is a perfectly valid character.
The Length property of the resulting string
even counts the zeros as valid characters! There must be a bug there.

No, the flaw is in your understanding, assuming that character 0 must
be a string terminator - it's not.
 
..
..
..
Again, how *exactly* did you read the data? Did you take note of the
return value of Read?

No, I didn't think to check that one! That seems to work okay.
 
Back
Top