H
Hoop
Hi,
I have a spreadsheet that contains multiple languages. Per project
requirements I have saved this speadsheet in a tab-delimted format
using Excel 2007. I have been able to pretty much get all the
charaters correct when reading the file. I will also have to parse
Japanese from this, which I have not been able to do. I cannot read
the 3 byte characters from the file correctly for some reason. I have
an example here showing 2 characters for simplicity.
ë ’
I get the correct value for ë, but not for ’, The value of ’ should be
2019, instead I getting something that just displays a box in the
debugger, some value around 65k.
Note in the code snippet below, If I pass the unicodeString to the
foreach (Byte b in encodedBytes)
it works perfectly, outs the correct values for the charaters.. I have
a tab-delimted file that contains the same characters. When I open
that file and reead from it, the ë is correct, but the ’ is not.
Likewise for the Japanese characters I am tring to read. Almost seems
like the issue is the file open() or the readline().
Any help and code examples would be appreciated.
Thanks
Jeff
String unicodeString = " ’ ë ’ ";;
// Create a UTF-8 encoding.
UTF8Encoding utf8 = new UTF8Encoding();
// determine whether fileName is a file
if (File.Exists(fileName))
{
// obtain reader and file contents
StreamReader stream = new StreamReader(fileName);
// Open the file to read from.
using (StreamReader sr = File.OpenText(fileName))
{
while( sr.ReadLine() != null )
{
// Encode the string.
Byte[] encodedBytes = utf8.GetBytes
(unicodeString);
Console.WriteLine();
Console.WriteLine("Encoded bytes:");
foreach (Byte b in encodedBytes)
{
if (b != 9)
{
Console.Write("[{0}]", b);
}
}
}
}
} // end if
I have a spreadsheet that contains multiple languages. Per project
requirements I have saved this speadsheet in a tab-delimted format
using Excel 2007. I have been able to pretty much get all the
charaters correct when reading the file. I will also have to parse
Japanese from this, which I have not been able to do. I cannot read
the 3 byte characters from the file correctly for some reason. I have
an example here showing 2 characters for simplicity.
ë ’
I get the correct value for ë, but not for ’, The value of ’ should be
2019, instead I getting something that just displays a box in the
debugger, some value around 65k.
Note in the code snippet below, If I pass the unicodeString to the
foreach (Byte b in encodedBytes)
it works perfectly, outs the correct values for the charaters.. I have
a tab-delimted file that contains the same characters. When I open
that file and reead from it, the ë is correct, but the ’ is not.
Likewise for the Japanese characters I am tring to read. Almost seems
like the issue is the file open() or the readline().
Any help and code examples would be appreciated.
Thanks
Jeff
String unicodeString = " ’ ë ’ ";;
// Create a UTF-8 encoding.
UTF8Encoding utf8 = new UTF8Encoding();
// determine whether fileName is a file
if (File.Exists(fileName))
{
// obtain reader and file contents
StreamReader stream = new StreamReader(fileName);
// Open the file to read from.
using (StreamReader sr = File.OpenText(fileName))
{
while( sr.ReadLine() != null )
{
// Encode the string.
Byte[] encodedBytes = utf8.GetBytes
(unicodeString);
Console.WriteLine();
Console.WriteLine("Encoded bytes:");
foreach (Byte b in encodedBytes)
{
if (b != 9)
{
Console.Write("[{0}]", b);
}
}
}
}
} // end if