Binary Read Method?

  • Thread starter Thread starter ShaneO
  • Start date Start date
Stephany said:
And the last 'record' in the file obeys the fixed-length rule, i.e., the
last byte in the file is &H0?
Yes, and the file-sizes are evenly divisible by the number of Records.
And the next question is, what does the varying amount of whitespace consist
of? Is it a sequence of spaces (&H20)? If so, does a sequence of 2 spaces
occur anywhere else other than within the 'whitespace' areas?
Yes, whitespaces are &H20. No, sequential &H20's only occur in the Text
areas as whitespace, but I can't rule-out &H20 appearing in consecutive
bytes within a numeric field.

If it's of any consequence, "Deleted" records are filled with &HFF, but
still delimited with &H0.

ShaneO

There are 10 kinds of people - Those who understand Binary and those who
don't.
 
Stephany said:
Dim _bytes as Byte() = File.ReadAllBytes(_filename)

You will also need a 'pointer' that always indicates the next byte to be
dealt with:

Dim _pointer as Integer = 0

Your processing loop now becomes:

While _pointer < _bytes.Length
End While
I've just tested this method on the "smaller" files (up to 20MB) and
it's working even better than the StreamReader method!

It now only takes 1/3 the amount of time of that method. Keep this up
and the application will be so fast it will extract the data even before
it's been launched!!!!!!

ShaneO

There are 10 kinds of people - Those who understand Binary and those who
don't.
 
Whoa up!!!!!!!

We're talking about your textual file here, not your binary file.

How an &H20's appear in consecutive bytes within a numeric field. Seeing as
all your fields are strings, that's a contradiction in terms.

I assume that your "Deleted" records commant also applies to your binary
file and not your textual file.

By your definition thus far, a 'record in your file lookes like:

AAAAAAAA{0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0}
.... {0}

where {0} represents the NUL delimiter.

A record may also look like:

AAAA {0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0}
.... {0}

but will never look like:

AA AAAA{0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0}
.... {0}

The way I would be inclined to approach this is:

Dim _sb As New StringBuilder

Dim _br as New BinaryReader(File.Open(infileName, FileMode.Open))

While _br.PeekChar() <> -1
Dim _fields As String() =
Encoding.ASCII.GetString(_br.ReadBytes(_recsize)).TrimEnd(Char.MinValue).Split(Char.MinValue)
For _i As Integer = 0 To _fields.Length -1
_fields(_i) = _fields(_i).Trim
' Do whatever else needs doing with the 'field'
Next
_sb.AppendLine(String.Join(",", _fields))
Loop

_br.Close()

File.WriteAllText(outfilename, _sb.ToString)

_br.PeekChar() will return -1 when there are no more characters in the
stream.

Encoding.ASCII.GetString(_br.ReadBytes(_recsize)) reads the specified number
of the bytes from the stream, (advancing the stream pointer), and converts
it to a string.

TrimEnd(Char.MinValue) strips the final &H0 from the string.

Split(Char.MinValue) returns an array of strings from the string using &H0
as the split delimiter.

The inner loop is used to trim any whitespace from all the strings in the
array and makes each 'field' available for further processing. If the value
of the 'field' needs to modified it can be done here.

String.Join(",", _fields) create as comma-delimited string comprising all
the array elements.

The AppendLine method of the StringBuilder appends the supplied string and
automatically add a NewLine.

Now all you variables, (sA, sB, etc.) are not required, nor do you have to
worry about where a given 'field' starts or how long it is. Each 'field' is
determined by it's (0 based) position in the array, i.e., sA equates to
position 0, sB equates to position 1, etc.
 
Stephany said:
Whoa up!!!!!!!

We're talking about your textual file here, not your binary file.
Sorry, I forgot!
I assume that your "Deleted" records commant also applies to your binary
file and not your textual file.
No. Both the Binary & Textual files have &HFF as identifiers of a
Deleted Record. (I had to modify the .ASCII to be .UTF8 because of this).
By your definition thus far, a 'record in your file lookes like:

AAAAAAAA{0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0}
... {0}

where {0} represents the NUL delimiter.

A record may also look like:

AAAA {0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0}
... {0}

but will never look like:

AA AAAA{0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0}
... {0}
All the above is perfectly correct! But it can look like -

AAAA {0}BBBBB BBBBBBBBBBBBBB
{0}... {0}

Which is why I read to the End of the Field, not just to the first &H20.
(The above example demonstrates that I'm reading Customer/Supplier Codes
followed by Customer/Supplier Names)
The way I would be inclined to approach this is:
OK, it's going to take a bit of time to test what you've provided, but I
will certainly see how it goes. In another post, I've already mentioned
the amazing results of simply using the ReadAllBytes approach, at least
for the smaller Text files, so the options here may be somewhat moot,
but it will no doubt stand me in good stead for the Binary files!

ShaneO

There are 10 kinds of people - Those who understand Binary and those who
don't.
 
Back
Top