BinaryReader.ReadBytes issue

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi,

I am trying to optimize the reading of a huge binary file into a byte[]...

I am doing the following..



byte[] ba = new byte[br.BaseStream.Length];

ba = br.ReadBytes((int)br.BaseStream.Length);



The problem is., BinaryReader.ReadBytes(...) only takes an int wherase
BinaryReader.BaseStream.Length is a long. Why isnt there a ReadBytes that
takes a long?

Chances are I wont reach this problem but the problem will be there none the
less.
 
I am trying to optimize the reading of a huge binary file into a byte[]...

I am doing the following..

byte[] ba = new byte[br.BaseStream.Length];

ba = br.ReadBytes((int)br.BaseStream.Length);

Why are you allocating an array and then immediately turning it into
garbage?
The problem is., BinaryReader.ReadBytes(...) only takes an int wherase
BinaryReader.BaseStream.Length is a long. Why isnt there a ReadBytes that
takes a long?

Chances are I wont reach this problem but the problem will be there none the
less.

It's certainly not ideal, but I would expect that if you actually had a
file larger than 2Gb, you wouldn't want to be reading it all in in a
single call anyway.
 
Hi,
I am trying to optimize the reading of a huge binary file into a byte[]...

Writing huge files into byte[] is not any optimisation.

byte[] ba = new byte[br.BaseStream.Length];

You don't need the initialization of "ba" while you are using
br.ReadBytes(...)
ba = br.ReadBytes((int)br.BaseStream.Length);

This is a bad practise to read unknown stream (unknown size) in
single line.
The problem is., BinaryReader.ReadBytes(...) only takes an int wherase
BinaryReader.BaseStream.Length is a long. Why isnt there a ReadBytes that
takes a long?

Because mostly there's no need to fill memory with huge files.
(files greater than 2 GB)
Think over your design and your needs. If you really want to
read huge files than you can easily get out of memmory!

Marcin
 
Jon Skeet said:
I am trying to optimize the reading of a huge binary file into a byte[]...

I am doing the following..

byte[] ba = new byte[br.BaseStream.Length];

ba = br.ReadBytes((int)br.BaseStream.Length);

Why are you allocating an array and then immediately turning it into
garbage?

Why does ReadBytes reallocate it or copy into it?
It's certainly not ideal, but I would expect that if you actually had a
file larger than 2Gb, you wouldn't want to be reading it all in in a
single call anyway.

Thats the extreme which I wont be anywhere near that range.
 
I now do the following... to just be safe on the casting limit.

byte[] ba = new byte[br.BaseStream.Length]; // are you saying this
should be null before and let .ReadBytes allocate it (if it does that?)

if (br.BaseStream.Length <= int.MaxValue)
{
// we are within the casting limits so we can use the optimized
method of reading
ba = br.ReadBytes((int)br.BaseStream.Length);
}
else
{
// we are outside the limits (rare) so we can use the normal way
of reading (slower)
ArrayList b = new ArrayList();
byte readByte = 0x00;
while(br.BaseStream.Position < br.BaseStream.Length)
{
readByte = br.ReadByte();
b.Add(readByte);
}

ba = (byte[])b.ToArray(typeof(byte));
}




Jon Skeet said:
I am trying to optimize the reading of a huge binary file into a byte[]...

I am doing the following..

byte[] ba = new byte[br.BaseStream.Length];

ba = br.ReadBytes((int)br.BaseStream.Length);

Why are you allocating an array and then immediately turning it into
garbage?
The problem is., BinaryReader.ReadBytes(...) only takes an int wherase
BinaryReader.BaseStream.Length is a long. Why isnt there a ReadBytes that
takes a long?

Chances are I wont reach this problem but the problem will be there none the
less.

It's certainly not ideal, but I would expect that if you actually had a
file larger than 2Gb, you wouldn't want to be reading it all in in a
single call anyway.
 
I need it as a byte[] internally, its being read.


Marcin Grzêbski said:
Hi,
I am trying to optimize the reading of a huge binary file into a
byte[]...

Writing huge files into byte[] is not any optimisation.

byte[] ba = new byte[br.BaseStream.Length];

You don't need the initialization of "ba" while you are using
br.ReadBytes(...)
ba = br.ReadBytes((int)br.BaseStream.Length);

This is a bad practise to read unknown stream (unknown size) in
single line.
The problem is., BinaryReader.ReadBytes(...) only takes an int wherase
BinaryReader.BaseStream.Length is a long. Why isnt there a ReadBytes that
takes a long?

Because mostly there's no need to fill memory with huge files.
(files greater than 2 GB)
Think over your design and your needs. If you really want to
read huge files than you can easily get out of memmory!

Marcin


Strange, that you say its not an optimization, it sure runs faster. I guess
you know better than the runtime.
 
.. said:
I need it as a byte[] internally, its being read.

I see.
But can't you keep it as a collection of byte[] buffers?
e.g. as an *ArrayList* of byte[] elements with length = 4096

Then you can access those buffers as *ArrayList* items.
Of course buffer length can be set to other value.

Marcin
 
Marcin Grzêbski said:
Hi,
I am trying to optimize the reading of a huge binary file into a
byte[]...

Writing huge files into byte[] is not any optimisation.

byte[] ba = new byte[br.BaseStream.Length];

You don't need the initialization of "ba" while you are using
br.ReadBytes(...)

fine its byte[] ba = null then.
This is a bad practise to read unknown stream (unknown size) in
single line.



Hello, EARTH. BaseStream.Length IS THE SIZE and therefore KNOWN. What do you
propse then genius boy. I need the entire file in memory in a byte[], so how
else would you do it brainiac mr.mensa.


Because mostly there's no need to fill memory with huge files.
(files greater than 2 GB)
Think over your design and your needs. If you really want to
read huge files than you can easily get out of memmory!

ALl i need to do is get the file into memory for another part, that other
part I dont give a rats a.rse about not my problem.

I am talking average of 300 K files
 
byte[] ba = new byte[br.BaseStream.Length];

ba = br.ReadBytes((int)br.BaseStream.Length);

Why are you allocating an array and then immediately turning it into
garbage?

Why does ReadBytes reallocate it or copy into it?

Well look at the call - you're not telling it where to read to, it's
returning a reference to a new array.
Thats the extreme which I wont be anywhere near that range.

In which case, it's fine :)
 
I now do the following... to just be safe on the casting limit.

byte[] ba = new byte[br.BaseStream.Length]; // are you saying this
should be null before and let .ReadBytes allocate it (if it does that?)

I'm saying you don't need to assign a value to it at all, as you assign
the value when you've done the read.
if (br.BaseStream.Length <= int.MaxValue)
{
// we are within the casting limits so we can use the optimized
method of reading
ba = br.ReadBytes((int)br.BaseStream.Length);
}
else
{
// we are outside the limits (rare) so we can use the normal way
of reading (slower)
ArrayList b = new ArrayList();
byte readByte = 0x00;
while(br.BaseStream.Position < br.BaseStream.Length)
{
readByte = br.ReadByte();
b.Add(readByte);
}

ba = (byte[])b.ToArray(typeof(byte));
}

No, that second bit isn't a good idea. If you've got a file of over
2Gb, you most certainly *don't* want to create an ArrayList where each
element is a byte read from the file. It would take at least 12 times
the file size - so you'd end up with a memory usage of *at least* 24Gb.
Not pretty.

Do you really want to create an array that is the size of the whole
file, if it's more than 2Gb? I would expect any sane use of such a file
to be either something which can discard the bytes as it reads and
processes them, or something which seeks around within the file. A
safer bet is probably to throw an exception.
 
Its has to be a contigous block. So unless you can put up a better solution
wihtout yackin yer gob off, you can go crap it right up.
Marcin Grzêbski said:
. said:
I need it as a byte[] internally, its being read.

I see.
But can't you keep it as a collection of byte[] buffers?
e.g. as an *ArrayList* of byte[] elements with length = 4096

Then you can access those buffers as *ArrayList* items.
Of course buffer length can be set to other value.

Marcin
 
Marcin Grzebski said:
I need it as a byte[] internally, its being read.

I see.
But can't you keep it as a collection of byte[] buffers?
e.g. as an *ArrayList* of byte[] elements with length = 4096

Then you can access those buffers as *ArrayList* items.
Of course buffer length can be set to other value.

That would quite possibly make the client code much harder to write.
It's not unreasonable to read the whole of a file as a byte array.
However, I wouldn't do it in the way suggested. I wouldn't use a
BinaryReader at all, in fact. I'd open up a normal stream, and read
blocks into a MemoryStream, then turn the MemoryStream into a byte
array. That way there isn't a problem if the file changes size between
you asking for the size and you reading a file - you just keep reading
blocks until you've finished.
 
Yeah well its a pretty common mistake im sure. *looks around and shuffles
it under the rug*


Jon Skeet said:
byte[] ba = new byte[br.BaseStream.Length];

ba = br.ReadBytes((int)br.BaseStream.Length);

Why are you allocating an array and then immediately turning it into
garbage?

Why does ReadBytes reallocate it or copy into it?

Well look at the call - you're not telling it where to read to, it's
returning a reference to a new array.
Thats the extreme which I wont be anywhere near that range.

In which case, it's fine :)
 
int.MaxValue isnt 2GB, its there just incase thats all and
99.9999999999999999999% wont be hit.


Jon Skeet said:
I now do the following... to just be safe on the casting limit.

byte[] ba = new byte[br.BaseStream.Length]; // are you saying this
should be null before and let .ReadBytes allocate it (if it does that?)

I'm saying you don't need to assign a value to it at all, as you assign
the value when you've done the read.
if (br.BaseStream.Length <= int.MaxValue)
{
// we are within the casting limits so we can use the optimized
method of reading
ba = br.ReadBytes((int)br.BaseStream.Length);
}
else
{
// we are outside the limits (rare) so we can use the normal way
of reading (slower)
ArrayList b = new ArrayList();
byte readByte = 0x00;
while(br.BaseStream.Position < br.BaseStream.Length)
{
readByte = br.ReadByte();
b.Add(readByte);
}

ba = (byte[])b.ToArray(typeof(byte));
}

No, that second bit isn't a good idea. If you've got a file of over
2Gb, you most certainly *don't* want to create an ArrayList where each
element is a byte read from the file. It would take at least 12 times
the file size - so you'd end up with a memory usage of *at least* 24Gb.
Not pretty.

Do you really want to create an array that is the size of the whole
file, if it's more than 2Gb? I would expect any sane use of such a file
to be either something which can discard the bytes as it reads and
processes them, or something which seeks around within the file. A
safer bet is probably to throw an exception.
 
This is going into a serializer so it needs to be a byte[] and the files are
average of 300 to 400K in size, its actually packing the data for exporting
and importing across systems.



Jon Skeet said:
I now do the following... to just be safe on the casting limit.

byte[] ba = new byte[br.BaseStream.Length]; // are you saying this
should be null before and let .ReadBytes allocate it (if it does that?)

I'm saying you don't need to assign a value to it at all, as you assign
the value when you've done the read.
if (br.BaseStream.Length <= int.MaxValue)
{
// we are within the casting limits so we can use the optimized
method of reading
ba = br.ReadBytes((int)br.BaseStream.Length);
}
else
{
// we are outside the limits (rare) so we can use the normal way
of reading (slower)
ArrayList b = new ArrayList();
byte readByte = 0x00;
while(br.BaseStream.Position < br.BaseStream.Length)
{
readByte = br.ReadByte();
b.Add(readByte);
}

ba = (byte[])b.ToArray(typeof(byte));
}

No, that second bit isn't a good idea. If you've got a file of over
2Gb, you most certainly *don't* want to create an ArrayList where each
element is a byte read from the file. It would take at least 12 times
the file size - so you'd end up with a memory usage of *at least* 24Gb.
Not pretty.

Do you really want to create an array that is the size of the whole
file, if it's more than 2Gb? I would expect any sane use of such a file
to be either something which can discard the bytes as it reads and
processes them, or something which seeks around within the file. A
safer bet is probably to throw an exception.
 
int.MaxValue isnt 2GB

Yes it is. To be precise, it's 2,147,483,647, as per the documentation.
its there just incase thats all and
99.9999999999999999999% wont be hit.

So why make it cause grief when you do hit it, instead of cleanly
throwing an exception to say that you can't really cope adequately?
 
The file contents wont change at this point that I know.

In that case, you're fine.
A stream is a stream memory stream or binaryreader stream, it will take the
same footprint.

MemoryStream.Read still takes an int for the count of the blocks to read
again we have this casting possibility (very rare though).

You wouldn't be reading from the MemoryStream though - you'd be calling
ToArray on it to get the bytes back.
 
The file contents wont change at this point that I know.

A stream is a stream memory stream or binaryreader stream, it will take the
same footprint.

MemoryStream.Read still takes an int for the count of the blocks to read
again we have this casting possibility (very rare though).



Jon Skeet said:
Marcin Grzebski said:
I need it as a byte[] internally, its being read.

I see.
But can't you keep it as a collection of byte[] buffers?
e.g. as an *ArrayList* of byte[] elements with length = 4096

Then you can access those buffers as *ArrayList* items.
Of course buffer length can be set to other value.

That would quite possibly make the client code much harder to write.
It's not unreasonable to read the whole of a file as a byte array.
However, I wouldn't do it in the way suggested. I wouldn't use a
BinaryReader at all, in fact. I'd open up a normal stream, and read
blocks into a MemoryStream, then turn the MemoryStream into a byte
array. That way there isn't a problem if the file changes size between
you asking for the size and you reading a file - you just keep reading
blocks until you've finished.
 
Well the debugger wont show int.MaxValue in the watch , didnt think it was
that high.

Yeah I could throw an exception.
 
Is 2GB the largest file size on NTFS? Just curious why the read methods are
limited to int.maxvalue and not long.
 
Back
Top