Char datatype does some hocus pocus

  • Thread starter Thread starter Arne Garvander
  • Start date Start date
A

Arne Garvander

Dim B As Byte = 150
Dim C As Char
C = Convert.ToChar(B)
B = Asc(C)
And the value of B is 63
Why?
 
Arne Garvander said:
Dim B As Byte = 150
Dim C As Char
C = Convert.ToChar(B)
B = Asc(C)
And the value of B is 63
Why?

Convert.ToChar will use Unicode - so you end up with Unicode 150. That
character (the control character "start of guarded area") almost
certainly isn't in your ANSI character encoding.

Moral: avoid Asc and Chr, which implicitly use ANSI. Use the Encoding
class instead, where you explicitly specify the encoding.
 
Dim B As Byte = 150
Dim C As Char
C = Convert.ToChar(B)
B = Asc(C)
And the value of B is 63
Why?

Because 150 isn't a valid ASCII value (ASCII is 0 to 127). 63 is the
ASCII code for a question mark ('?'), and that's what Asc is returning
when you pass it a character that doesn't have a valid ASCII code.

Pete
 
Peter Duniho said:
Because 150 isn't a valid ASCII value (ASCII is 0 to 127). 63 is the
ASCII code for a question mark ('?'), and that's what Asc is returning
when you pass it a character that doesn't have a valid ASCII code.

To be strict about it, Asc is misnamed - it returns 63 when you pass it
a character which doesn't have a valid *ANSI* code for the default ANSI
code page on your system. Unfortunately there's a bad history of
assuming that ASCII==ANSI (and that ANSI is a specific encoding, rather
than a whole collection of them).

<shudders>
 
To be strict about it, Asc is misnamed - it returns 63 when you pass it
a character which doesn't have a valid *ANSI* code for the default ANSI
code page on your system. Unfortunately there's a bad history of
assuming that ASCII==ANSI (and that ANSI is a specific encoding, rather
than a whole collection of them).

Ah, I see. I changed the code to use Chr(B) instead of Convert.ToChar()
and it does actually convert to the appropriate Unicode character
('\u2013' in this case). I made the error of assuming that the call to
Convert.ToChar() was doing something that the OP expected, but now I see
that it expects Unicode values and so passing what's an ANSI value does
actually allow Asc() to return the right thing.

I had assumed from the name that Asc() would return only ASCII values, but
looking at the doc page I see that it does return characters from the ANSI
range, just as you said.

I guess the moral is either to follow your original advice (use an
explicit Encoding for conversion), or in VB if you are just using the
default ANSI encoding then just use the appropriate function, Chr(),
instead of passing an ANSI value to a function that expects Unicode
(Convert.ToChar()).

Of course, yet another fix is to just stop using ANSI and switch over to
Unicode. :) Then if you want to specify that character by value, use
2013 instead of 150 and everything works fine.

Pete
 
[...] I made the error of assuming that the call to Convert.ToChar()
was doing something that the OP expected, but now I see that it expects
Unicode values and so passing what's an ANSI value does actually allow
Asc() to return the right thing.

I wrote that, and I'm not even sure what it means. Obviously, passing an
ANSI value to a method that expects a Unicode value isn't going to do the
right thing. Not then, not later.

If you pass the correct Unicode value to Convert.ToChar(), you can later
get the expected ANSI value from Asc(). The Unicode will be converted to
ANSI for you. But only if you start with the correct, corresponding
Unicode character in the first place.

But this is a VB-specific thing. I agree that if you're writing .NET
code, and doing character encoding conversions, that using the actual
Encoding class is the appropriate solution. It makes it much more clear
what's going on, and isn't tied to a specific character encoding (the code
can be easily changed or generalized to use a different encoding than
ANSI).

Of course, it's not clear from the OP's post that he wanted or expected
_any_ conversion. In which case, it's really just a matter of avoiding
the .NET methods in the first place, and sticking to the VB functions that
deal only in ANSI.

Anyway, sorry for writing that confusing sentence. I wish I know what
thought I was trying to express at the time. :)

Pete
 
If I try to put non-ANSI data into a string then I am screwed.
Before I got my current contract, someone started to put EBCDIC data into
strings.
No it will take to much effort to clean it up.
--
Arne Garvander
Certified Geek
Professional Data Dude


Peter Duniho said:
[...] I made the error of assuming that the call to Convert.ToChar()
was doing something that the OP expected, but now I see that it expects
Unicode values and so passing what's an ANSI value does actually allow
Asc() to return the right thing.

I wrote that, and I'm not even sure what it means. Obviously, passing an
ANSI value to a method that expects a Unicode value isn't going to do the
right thing. Not then, not later.

If you pass the correct Unicode value to Convert.ToChar(), you can later
get the expected ANSI value from Asc(). The Unicode will be converted to
ANSI for you. But only if you start with the correct, corresponding
Unicode character in the first place.

But this is a VB-specific thing. I agree that if you're writing .NET
code, and doing character encoding conversions, that using the actual
Encoding class is the appropriate solution. It makes it much more clear
what's going on, and isn't tied to a specific character encoding (the code
can be easily changed or generalized to use a different encoding than
ANSI).

Of course, it's not clear from the OP's post that he wanted or expected
_any_ conversion. In which case, it's really just a matter of avoiding
the .NET methods in the first place, and sticking to the VB functions that
deal only in ANSI.

Anyway, sorry for writing that confusing sentence. I wish I know what
thought I was trying to express at the time. :)

Pete
 
Back
Top