Byte Array to String

  • Thread starter Thread starter AG
  • Start date Start date
A

AG

I have a file that contains ASCII and Extended ASCII characters.
I need to get the file contents into a string, but the Extended ASCII
characters (dec 128 and 129) are being changed to dec 63.

I have tried several methods, but here is the one I thought would have
worked.

Dim strReturn As String
Dim arBytes() As Byte
arBytes = System.IO.File.ReadAllBytes(<myfile>)
strReturn = System.Text.Encoding.UTF8.GetString(arBytes)

When I examine strReturn, I find that the chars that should be chr(128) and
chr(129) are all chr(63).

The only thing I could get to work is

Dim strReturn As String = String.Empty
Dim arBytes() As Byte
Dim sB As New StringBuilder
Dim byT As Byte

arBytes = System.IO.File.ReadAllBytes(strPathFile)
For Each byT In arBytes
sB.Append(Chr(byT))
Next
strReturn = sB.ToString

Can anyone offer an explanation, and/or a better method?
 
i use this to read file contents

Public Function GetFileContents(ByVal FullPath As String, _
Optional ByRef ErrInfo As String = "") As String

Dim strContents As String
Dim objReader As StreamReader
Try

objReader = New StreamReader(FullPath)
strContents = objReader.ReadToEnd()
objReader.Close()
Return strContents
Catch Ex As Exception
ErrInfo = Ex.Message
Return Nothing
End Try
End Function
 
Look at the Encoding object, as it is the quickest. THe other method is some
form of loop, as suggested by Nick.

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
 
Hi AG,

If the file contains character that exceed the ASCII char code scope(and
those chars are stored correctly), that means the file's content is not
stored as ASCII encoding(single byte charset).

Generally speaking, if you're reading a text file(which means its content
are character text rather than unreadable binary content), you should use
text reading mode to read them(rather than read them as byte and convert
them your self).

And to read file as text mode, you need to know what is the
encoding/charset of the text file's content. this info is needed when you
try reading the file in Text Mode. For example, you can use the
"StreamReader" class in .net to read file in text mode as below:

=================
StreamReader sr = new StreamReader("inputfile.txt", Encoding.UTF8);
string content = sr.ReadToEnd();

sr.Close();
================

or you can also let the StreamReader to determine the encoding
automatically (through file's BOM). But BOM(Byte Order mark) is not
existent in text file:

======================
StreamReader sr1 = new StreamReader("inputfile.txt", true);

string content1 = sr1.ReadToEnd();

sr1.Close();
=================

for your case, I think the file's encoding is likely not UTF8, and if you
use UTF8 to decode the byte, you'll probably get wrong character.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
 
Thanks Nick.
That is one of the methods that I tried which does not produce the desired
results.

--

AG
Email: discussATadhdataDOTcom
Nick Chan said:
i use this to read file contents

Public Function GetFileContents(ByVal FullPath As String, _
Optional ByRef ErrInfo As String = "") As String

Dim strContents As String
Dim objReader As StreamReader
Try

objReader = New StreamReader(FullPath)
strContents = objReader.ReadToEnd()
objReader.Close()
Return strContents
Catch Ex As Exception
ErrInfo = Ex.Message
Return Nothing
End Try
End Function
 
Thanks Gregory.
I have looked at it and tried several different methods. I guess I don't
understand it well enough.
 
Thanks for the reply Steven.

I ended up reading as byte and converting myself because text reading mode
(streamreader) produced the wrong characters for the extended ASCII
characters.

Perhaps a bit more of an explanation.

The file is created by an Access application using VBA, as a method of
exporting some database data.
Since the data may contain all the usual record and field separators like
crlf, commas, tabs, quotes, etc., the extended ASCII chars are used as
record and field separators.

It is created using the Open for append method and data added via the Print
method, as follows. This method can not be changed, as it is in use in too
many locations.

Dim strRecord as string
strRecord = "field1data" & Chr(128) & "field2data" & Chr(128) & "field3data"
& Chr(129)
Open <thefile> For Append As #1
Print #1, strRecord
Close #1

As you can see, there is no BOM.

The file is easily opened and read in VBA using Open For Binary:
Dim strFileData as String
Open <thefile> For Binary As #1
strFileData = space(FileLen(<thefile>)
Get #1, , strFileData
Close #1

This all works fine in VBA. Now, I would like to read the file using .NET
framework.
While my method of using Chr() on each byte works, it would seem that there
should be a similar simple method in .NET to get the file contents without
looping through each byte.
According to the help file, Chr uses the Encoding class to return the
appropriate character, so isn't there a method in the Encoding class that
would perform the operation on the entire stream?
 
You probably need the figure out what the codepage of the ASCII file is.

It is probably 437 or 850 (if you are american of westeuropian), but other
are also possible.
Most likely it depends where you are located.

GetEncoding("cp437") of something like that should give you the encoding.

Henk
 
Thanks for your reply,

Yes, for text file, if we doesn't get the correct encoding/charset, the
retrieved text will mismatch the original characters.

For your scenario, I think VBA may use the default system locale to
encoding the characters. You can also try
"Encoding.Default" as the parameter in the SreamReader's constructor.
"Encoding.Default" means the current system ANSI codepage. If this still
not work, I think the VBA is producing the file like a binary format
one(doesn't use a consistent encoding for the entire file) and thus, using
binary read mode to decode it individually should be reasonable.

Anyway, if you have any further questions on this, welcome to post here.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.



--------------------
Reply-To: "AG" <[email protected]>
From: "AG" <[email protected]>
References: <[email protected]>
Subject: Re: Byte Array to String
Date: Thu, 22 Nov 2007 09:25:49 -0500
 
Thanks Steven.
Encoding.Default, which is 1252 does work.as I reported in my response to
Henk's post.
I knew there had to be a simple solution.
Thanks to all responders.
 
1252 is the default windows codepage, most often also called ANSI.
For american and western europa windows installation you probably can also use Encoding.Default....

it is alway a problem detecting the codepage a text is in. Once loaded into memory a string is always unicode (a multibyte charset).
But on disk or in streams the charset is always decoded.

Under DOS this used to be codepage 437 (us american) or 850 (western europe), but many others can also be possible.
Quite frankly ... a pain in the neck.. cauz you would never know which codepage a text was in, unless you knew where it came from.

Nowedays, the most frequent used encodings are ANSI, UTF-8 or UTF-16.

It is quite common that a file encoded in UTF-8 or UTF-16 has 3 bytes at the beginning of the file.
These bytes are called the Byte Order Marker, or short BOM.

..NET is capable of detecting this BOM, but for situations when there is no BOM you need to tell .NET in which encoding a file is.
In your situation it appears to be ANSI = CP1252 = Encoding.Default.

I would go for a streamreader:

Public Sub New ( _
stream As Stream, _
encoding As Encoding, _
detectEncodingFromByteOrderMarks As Boolean _

where detectEncodingFromByteOrderMarks is set to True, but for files without the BOM you would specify encoding.Default (=CP1252) as default encoding.
 
Thanks Henk,

You are correct, I am now using a stream and Encoding.Default does work. I had not previously not tried default as the documentation on encoding, while not specifying it, led me to believe that UTF-8 was the default. I agree, a pain...

--

AG
Email: discussATadhdataDOTcom
1252 is the default windows codepage, most often also called ANSI.
For american and western europa windows installation you probably can also use Encoding.Default....

it is alway a problem detecting the codepage a text is in. Once loaded into memory a string is always unicode (a multibyte charset).
But on disk or in streams the charset is always decoded.

Under DOS this used to be codepage 437 (us american) or 850 (western europe), but many others can also be possible.
Quite frankly ... a pain in the neck.. cauz you would never know which codepage a text was in, unless you knew where it came from.

Nowedays, the most frequent used encodings are ANSI, UTF-8 or UTF-16.

It is quite common that a file encoded in UTF-8 or UTF-16 has 3 bytes at the beginning of the file.
These bytes are called the Byte Order Marker, or short BOM.

.NET is capable of detecting this BOM, but for situations when there is no BOM you need to tell .NET in which encoding a file is.
In your situation it appears to be ANSI = CP1252 = Encoding.Default.

I would go for a streamreader:

Public Sub New ( _
stream As Stream, _
encoding As Encoding, _
detectEncodingFromByteOrderMarks As Boolean _

where detectEncodingFromByteOrderMarks is set to True, but for files without the BOM you would specify encoding.Default (=CP1252) as default encoding.
 
Yes, for your machine, since it is a western european region one, the
default encoding is usually windows 1252. However, this Encoding.Default
will also work for other region based systems. For example, on a machine
configured as east eastern asia locale, the Encoding.Default will return
the encoding/codepage for non-unicode convertion set on that machine.

Anyway, glad that it has been working for you now:)

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.


--------------------
Reply-To: "AG" <[email protected]>
From: "AG" <[email protected]>
Subject: Re: Byte Array to String
Date: Fri, 23 Nov 2007 08:19:58 -0500
 
Back
Top