Problem reading Unicode from a tab-delimited file

Hoop · Jan 18, 2009

Hi,
I have a spreadsheet that contains multiple languages. Per project
requirements I have saved this speadsheet in a tab-delimted format
using Excel 2007. I have been able to pretty much get all the
charaters correct when reading the file. I will also have to parse
Japanese from this, which I have not been able to do. I cannot read
the 3 byte characters from the file correctly for some reason. I have
an example here showing 2 characters for simplicity.

ë ’

I get the correct value for ë, but not for ’, The value of ’ should be
2019, instead I getting something that just displays a box in the
debugger, some value around 65k.
Note in the code snippet below, If I pass the unicodeString to the
foreach (Byte b in encodedBytes)
it works perfectly, outs the correct values for the charaters.. I have
a tab-delimted file that contains the same characters. When I open
that file and reead from it, the ë is correct, but the ’ is not.
Likewise for the Japanese characters I am tring to read. Almost seems
like the issue is the file open() or the readline().
Any help and code examples would be appreciated.
Thanks
Jeff

String unicodeString = " ’ ë ’ ";;
// Create a UTF-8 encoding.
UTF8Encoding utf8 = new UTF8Encoding();
// determine whether fileName is a file
if (File.Exists(fileName))
{
// obtain reader and file contents
StreamReader stream = new StreamReader(fileName);
// Open the file to read from.
using (StreamReader sr = File.OpenText(fileName))
{

while( sr.ReadLine() != null )
{

// Encode the string.
Byte[] encodedBytes = utf8.GetBytes
(unicodeString);
Console.WriteLine();
Console.WriteLine("Encoded bytes:");
foreach (Byte b in encodedBytes)
{
if (b != 9)
{
Console.Write("[{0}]", b);
}
}

}

}

} // end if

Martin Honnen · Jan 18, 2009

Hoop said:
I have a spreadsheet that contains multiple languages. Per project
requirements I have saved this speadsheet in a tab-delimted format
using Excel 2007. I have been able to pretty much get all the
charaters correct when reading the file. I will also have to parse
Japanese from this, which I have not been able to do.

Which encoding do you use to save the file with Excel?
If that is UTF-16 then doing File.OpenText does not work as that uses
UTF-8. So you will need to use e.g
using (StreamReader sr = new StreamReader("file.csv", Encoding.Unicode))

Hoop · Jan 18, 2009

Which encoding do you use to save the file with Excel?
If that is UTF-16 then doing File.OpenText does not work as that uses
UTF-8. So you will need to use e.g
using (StreamReader sr = new StreamReader("file.csv", Encoding.Unicode))

Hi Martin,
Not sure what encoding the file is saved in. From Excel I choose, Text
(Tabdelimited).
Though I will try your suggestion.
Thanks
Jeff

Hoop · Jan 18, 2009

Which encoding do you use to save the file with Excel?
If that is UTF-16 then doing File.OpenText does not work as that uses
UTF-8. So you will need to use e.g
using (StreamReader sr = new StreamReader("file.csv", Encoding.Unicode))

Hi Martin,
I added Encoding.Unicode and it did not make any difference, not
reading in ’ ë ’.
What I did as a test was make a file with notepad and save it as
Encoding: Unicode. I then opedn that file with the code below and it
worked.
So I am thinking either Excel 2007 does not save a tab delimited as
unicode, or I am saving it wrong.
Jeff

public Language(string fileName)
{
String unicodeString = " ’ ë ’ "; ;
// Create a UTF-8 encoding.
UTF8Encoding utf8 = new UTF8Encoding();
// determine whether fileName is a file
if (File.Exists(fileName))
{

// obtain reader and file contents
StreamReader stream = new StreamReader(fileName,
Encoding.Unicode);
// Open the file to read from.
using (stream = File.OpenText(fileName))
{

while (stream.ReadLine() != null)
{
unicodeString = stream.ReadLine();
// Encode the string.
Byte[] encodedBytes = utf8.GetBytes
(unicodeString);
Console.WriteLine();
Console.WriteLine("Encoded bytes:");
foreach (Byte b in encodedBytes)
{
if (b != 9)
{
Console.Write("[{0}]", b);
}
}

}

}

}//end if file exists

} //end constructor

Hoop · Jan 19, 2009

Which encoding do you use to save the file with Excel?
If that is UTF-16 then doing File.OpenText does not work as that uses
UTF-8. So you will need to use e.g
using (StreamReader sr = new StreamReader("file.csv", Encoding.Unicode))

Hi Martin,
Got it figured out. In Excel you can save the file as unicode, and
that is a tab delimited. So adding the Encoding.Unicode and changing
the way the file is saved seems to have solved it. Tommorow I will see
how the Japanese looks.
Thanks
Jeff

This spanish character string "ñ" cause something that I don't understand	7	Mar 31, 2010
Reading Unicode escape sequences from File	5	Jun 20, 2008
Unicode values	2	May 13, 2008
reading, writing xml and encoding question	16	Apr 30, 2008
I'm using about twice as many bytes of memory as the size of the file	8	Mar 4, 2010
How to create a .txt file with unicode encoding	1	Mar 27, 2007
Help!! Convert file encoding	2	Sep 2, 2008
Encoding UTF8 to / from Unicode	2	Jul 19, 2005

Problem reading Unicode from a tab-delimited file

Hoop

Martin Honnen

Hoop

Hoop

Hoop

Ask a Question

Similar Threads