ForwardOnlyMemoryStream?

  • Thread starter Thread starter Robert
  • Start date Start date
R

Robert

In my current project using memory streams gives about a 3x improvement
in throughtput versus a standard FileStream. I am reading fixed length records
from files.

This parallelizes nicely, since I do large atomic disk reads of the entire
file into a memory stream, letting the disk do sequential reads for each file.

With 2 worker threads, this works great. Triple the throughput..
But with 4 worker threads Out of Memory exceptions
start popping up. Currently just doing a try/catch/ Sleep/retry on these...

This lets some of the other threads finish thier bigger files, allowing
memory to be released.

This all works to some degree just with degraded performance (idled cores)
when a series of big files are encountered.

So, to the question:
Has anyone seen an implementation of a ForwardOnlyMemoryStream?
This would be the same as a memory stream, except that there would
be a Trim Method. Everything before the current StreamPosition would
be released from memory. I would call this after every 100KB of records
are read maybe.

The benefit would be that as each thread loads a big file, it will release
memory slowly and consistently, instead of one big release when the file
is done. This would allow better scaling without having to idle some cores
waiting for RAM to become available.

Sadly, 64 bit is not common yet..
 
Robert said:
In my current project using memory streams gives about a 3x improvement
in throughtput versus a standard FileStream. I am reading fixed length
records
from files.

This parallelizes nicely, since I do large atomic disk reads of the entire
file into a memory stream, letting the disk do sequential reads for each
file.

With 2 worker threads, this works great. Triple the throughput..
But with 4 worker threads Out of Memory exceptions
start popping up. Currently just doing a try/catch/ Sleep/retry on
these...

This lets some of the other threads finish thier bigger files, allowing
memory to be released.

This all works to some degree just with degraded performance (idled cores)
when a series of big files are encountered.

So, to the question:
Has anyone seen an implementation of a ForwardOnlyMemoryStream?
This would be the same as a memory stream, except that there would
be a Trim Method. Everything before the current StreamPosition would
be released from memory. I would call this after every 100KB of records
are read maybe.

The benefit would be that as each thread loads a big file, it will release
memory slowly and consistently, instead of one big release when the file
is done. This would allow better scaling without having to idle some
cores
waiting for RAM to become available.

Sadly, 64 bit is not common yet..

Hello

What you are asking sounds great to me but i never encountered another
readonly forwardonly memory stream besides the network stream wich does
exactly that
This parallelizes nicely, since I do large atomic disk reads of the entire
file into a memory stream, letting the disk do sequential reads for each
file.

AFAIK this automaticly means that your app will bomb when it comes above the
2 GB object limit
Sadly, 64 bit is not common yet..

Yes and exactly now when memory is so cheap at the moment , i even noticed
that a lot of manufactureres ship systems with 4 GB of memory ( DELL ,
ACER )
while providing a 32 bit operating system

regards

Michel
 
Robert said:
In my current project using memory streams gives about a 3x improvement
in throughtput versus a standard FileStream. I am reading fixed length records
from files.

This parallelizes nicely, since I do large atomic disk reads of the entire
file into a memory stream, letting the disk do sequential reads for each file.

With 2 worker threads, this works great. Triple the throughput..
But with 4 worker threads Out of Memory exceptions
start popping up. Currently just doing a try/catch/ Sleep/retry on these...

This lets some of the other threads finish thier bigger files, allowing
memory to be released.

This all works to some degree just with degraded performance (idled cores)
when a series of big files are encountered.

So, to the question:
Has anyone seen an implementation of a ForwardOnlyMemoryStream?
This would be the same as a memory stream, except that there would
be a Trim Method. Everything before the current StreamPosition would
be released from memory. I would call this after every 100KB of records
are read maybe.

The problem with that would be that the memory stream can't shrink the
array it's using for storage, so it would have to create a new array and
copy the remaining data to it. This could cause even more memory
problems than you have now...
The benefit would be that as each thread loads a big file, it will release
memory slowly and consistently, instead of one big release when the file
is done. This would allow better scaling without having to idle some cores
waiting for RAM to become available.

I would suggest that you make your own stream reader class that uses a
large buffer, that way you reduce the frequency of the disk access while
keeping the memory usage at a controllable level.
 
This parallelizes nicely, since I do large atomic disk reads of the entire
AFAIK this automaticly means that your app will bomb when it comes above the 2 GB object limit

Worse than that.. Currently on my system XP64, 2gb ram, whenever a .Net app
gets above 1.2 gigs, bad things happen.. My files average 500kb, and the largest
2 seen are about 6 megs! Not THAT big. But many things are done to the files,
burning ram all the way..

Yes and exactly now when memory is so cheap at the moment , i even noticed that a lot of manufactureres ship systems
with 4 GB of memory ( DELL , ACER )
while providing a 32 bit operating system

More ram is great, but give me a way to use it effectively.
Either one would be sufficient..
 
The problem with that would be that the memory stream can't shrink the
array it's using for storage, so it would have to create a new array and copy the remaining data to it. This could
cause even more memory problems than you have now...

With C, or C++, and maybe C# all things with memory are possible.
Arrays do not exist in C. Just memory. You have a raw pointer to the
first element, then you just increment your loop pointer by the element
size, till you reach the size of the underlying memory region.

So, to deallocate 100kb:
1) change pointer for element 0 to element 0 + 100kb
2) call Release on region for element 0 to element 0 + 100kb
3). All done!

Not so hard..
I would suggest that you make your own stream reader class that uses a large buffer, that way you reduce the frequency
of the disk access while keeping the memory usage at a controllable level.

Right. Then I get to do random IO instead of sequential.. Using buffered
FileStreams is 3x slower than the memory streams! Sounds like the solution
is worse than the problem.

Currently IDE disks (fastest kind for sequential reads) do sequential reads
at about 80MB/Sec. These disks do about 80 random reads a second.

So, if you read 1 byte, 80 bytes a second. Optimal for random would be
Sequential read speed/80 = about 1 meg. A buffer that size would break even
on the disk end of things.

And do not forget, after the file is read, the results have to be written...
So more random writes in the mix. I have 2 physical drives.
Reading from one, and writing to the other is maybe 100x faster on some
progs than sharing the same disk. Great for me, but not so good for
customers with only one drive...

Anyway, somebody else has to have seen a similiar problem. I hoped
that someone else had already done a smarter memory stream.
Doing it in VB is not an option, because no unsafe, raw direct access to
memory...

Memory stream does what I want. It just holds the memory too long.
I only read once. I do no seeking in the stream. So a smart implementation
would release memory as it is consumed.

Or, I could just spec an SSD for the app, and go back to filestreams.
Random Read speed = Sequential speed on those. Voila, problem goes
away! (along with a fair bit of cash...)

My current dev box is only 2 core. With the new intel chips you get 4 real cores
with 4 more hyper threaded ones. Looks like this would give minimal speedup for me.
CPU's either idle, or disk overloaded. Neither choice is very compelling.

Looks like scaling much on 32 bit is not an option for me. Either get a big raid array
to handle the bigger disk queues, or run out of available memory (limited to about 1.2
gigs for .Net apps.
 
Robert said:
With C, or C++, and maybe C# all things with memory are possible.
Arrays do not exist in C. Just memory. You have a raw pointer to the
first element, then you just increment your loop pointer by the element
size, till you reach the size of the underlying memory region.

So, to deallocate 100kb:
1) change pointer for element 0 to element 0 + 100kb
2) call Release on region for element 0 to element 0 + 100kb
3). All done!

Not so hard..

Then you are in the wrong newsgroup. This is VB.NET, and you can't
deallocate part of an object.
Right. Then I get to do random IO instead of sequential.. Using buffered
FileStreams is 3x slower than the memory streams! Sounds like the solution
is worse than the problem.

Why would you do random IO to read a file sequentially?
Currently IDE disks (fastest kind for sequential reads) do sequential reads
at about 80MB/Sec. These disks do about 80 random reads a second.

So, if you read 1 byte, 80 bytes a second. Optimal for random would be
Sequential read speed/80 = about 1 meg. A buffer that size would break even
on the disk end of things.

And do not forget, after the file is read, the results have to be written...
So more random writes in the mix. I have 2 physical drives.
Reading from one, and writing to the other is maybe 100x faster on some
progs than sharing the same disk. Great for me, but not so good for
customers with only one drive...

Anyway, somebody else has to have seen a similiar problem.

It really doesn't sound like you are ready to take advice from someone
with a bit of experience. Why did you post the question, really?
I hoped
that someone else had already done a smarter memory stream.
Doing it in VB is not an option, because no unsafe, raw direct access to
memory...

Memory stream does what I want. It just holds the memory too long.
I only read once. I do no seeking in the stream. So a smart implementation
would release memory as it is consumed.

Your problem isn't that you can't release the memory fast enough, it's
that you are allocating too much of it in the first place.
Or, I could just spec an SSD for the app, and go back to filestreams.
Random Read speed = Sequential speed on those. Voila, problem goes
away! (along with a fair bit of cash...)

I throught that you were reading the file sequentially? How would faster
random access make sequential reading faster?
 
With C, or C++, and maybe C# all things with memory are possible.
Then you are in the wrong newsgroup. This is VB.NET, and you can't deallocate part of an object.


We can easily use C DLL's (win32 comes to mind, including nearly all
of Windows itself), C# components, COM components, etc.
This group sees a lot of discussion about those components..
Many posts recommend 3rd party components (implemented in whatever) to solve VB problems.
The solution to the Forward mem stream does not need to be implemented in VB.
It just needs to be usable via VB. Thus making the lives of VB programmers easier.

Why would you do random IO to read a file sequentially?


As I mentioned, one large atomic read per thread gives the best perf.
IE: load the entire file into a memory stream. Buffer = file size

If the buffer is less than the size of the file, we must do at least 2 reads.
ie 1mb file, buffer is 999k, first read gets 999k, second gets 1k..
With multiple threads processing the files, the first read would
stall. Thread #2 then seeks to a new file, randomly, and does a read.
Thread #1 then needs more data: random seek, back to its file.

buffers smaller than the file size= at least 2x more random IO.
buffer = file size = out of memory when a few larger files hit at the same time.

It really doesn't sound like you are ready to take advice from someone with a bit of experience. Why did you post the
question, really?

refer to the line directly above: "Anyway, somebody else has to have seen a similiar problem."
I was hoping somebody could post thier solution/work around...

Let's skip comparing fortune 500 projects, years of experience, bill rates,
or university degrees. This will not make my project perform or scale better.

Back on topic:


Me: "Doctor it hurts when I hit my head with a hammer!"
Dr: "Do it twice as often. I have 'a bit of experience' and this will make it better.."
Me: "wtf?!? "

See the problem here?

Your problem isn't that you can't release the memory fast enough, it's that you are allocating too much of it in the
first place.


Back to the original question: We have a big buffer. It only needs to be
big for the first read. Then it could get a little smaller. As each thread
consumes records, each thread would release memory. Instead of
one release per file, we could have many. To reduce memory fragmentation
maybe do releases when stream.position is > 64k.

Your average file position would be the middle of the file.
One thread just starting, one nearly finished. 50%.

So, half the average buffer size is saved! A big win for mem with no
penalty for more random IO. Change one line of code where the
stream is declared, and BOOM, save on average half the buffer size.
With 8 cores, we save a full 4 buffers worth. Memory usage is down.
Scalability is increased. A clear win.

Analogy: My head is the memory limit. Hammer is memory usage.
Hammer hits head = pain..
In effect, we make the hammer smaller as each bit of work is done.
We may still get hit, but the hit will be less frequent, and less painful.

I throught that you were reading the file sequentially? How would faster random access make sequential reading faster?

Because the implementation changed from memorystream to filestream.?!?
First >> line above...
Scales good on SSD. No memory pressure because of small buffer, and no
random IO bottleneck,,

Currently, running serially, I get an avg disk Q length of 0.3 Not a problem.
1 Core then runs at about 60% cpu usage. So I should be able to scale this 3x
by going with 3 parallel threads. Disk Q would be .9 (more than 1 on IDE drives
makes things SLOW.) But then the 32 bit memory wall is hit.. More threads
will not help, more random IO will not help. The problem here is wasting memory

We can not increase memory for 32 bit processes. We can not change the behavior of
the average disk. We can change how we use the memory. A ForwardOnly memory stream
that release memory after reads would seem to do this.

Anyway, this post is getting long, and after rereading my reply, and clarifying things more
you gave me a couple of ideas. Check the next post for possible solutions.

Thanks for your input.
 
1) c# directly deallocating chunks of buffer() as byte.
- this would get messy..

2) Inherit stream and override a bunch of stuff.
- doable in VB

The main change would be that instead of having buffer() as byte
we would have Buffers as LinkedList of BufferChunk(64k) of byte.
We can not de-allocate a chunk of a buffer, but we definitely
can deallocate a child object.

When reading, care would have to be taken in cases where you spanned
two chunks. Build an array of desired size, copy part of chunk 1, and part
of chunk 2 into it.

Since I do not need Asynch support (I want all serial reads), or Write
support, or Seek support, I can just throw a "Not Supported" exception in many of the
methods..

I only need New, Close, Dispose, Length, Position, Read, and ReadByte..
Not so bad.

During the read routines,chunks ending before the current stream position could
be removed. This would automatically shrink the buffer is we advanced through the stream.

This class will be called ForwardMemoryStreamReader.

I will play with this for a few hours... Stay tuned.
 
Robert said:
As I mentioned, one large atomic read per thread gives the best perf.
IE: load the entire file into a memory stream. Buffer = file size

Yes, but that will always lead to situations where you have to wait for
a large enough continuous block of memory to be available.
If the buffer is less than the size of the file, we must do at least 2 reads.
ie 1mb file, buffer is 999k, first read gets 999k, second gets 1k..
With multiple threads processing the files, the first read would
stall. Thread #2 then seeks to a new file, randomly, and does a read.
Thread #1 then needs more data: random seek, back to its file.

If that is what you mean by random seek, then you never have a
sequential read. A file is divided into clusters, and there is no
guarantee that they are continuous on the disk.
Me: "Doctor it hurts when I hit my head with a hammer!"
Dr: "Do it twice as often. I have 'a bit of experience' and this will make it better.."
Me: "wtf?!? "

See the problem here?

No, I don't see how that is relevant. You don't have I/O problems, you
have memory problems.
Back to the original question: We have a big buffer. It only needs to be
big for the first read. Then it could get a little smaller. As each thread
consumes records, each thread would release memory. Instead of
one release per file, we could have many. To reduce memory fragmentation
maybe do releases when stream.position is > 64k.

Your average file position would be the middle of the file.
One thread just starting, one nearly finished. 50%.

The best that you could hope for is to reduce the memory usage by an
average of 50%, and as you are fragmenting the memory (by allocating
large and releasing small) it will most likely be considerably less.
Because the implementation changed from memorystream to filestream.?!?

So you think that your only two alternatives is to read the entire file
at once, or use an SSD disk?
 
As I mentioned, one large atomic read per thread gives the best perf.
Yes, but that will always lead to situations where you have to wait for a large enough continuous block of memory to
be available.

This brings us full circle to the original post, with trying to address
the memory issue! You missed this part?!?
If that is what you mean by random seek, then you never have a sequential read.

I certainly can, and do, frequently. To belabor the point
Read 64k chunks via a filestream with 2 threads Perf = 1X - drive light blinks a LOT
Read Filesize chunks with 2 threads Perf = 3x - drive light STAYS LIT
with one thread, each read would remain sequential. The disk head only needs to move
to the next cluster. Very fast.
With 2 threads, and contention for the disk, and the heads move a lot.
Sequential defined as in sequence. IE clusters being read in sequence 1,2,3 etc
Random - no order - ie clusters being read out of sequence 1,10,2,9,3,8 etc
Head jumps around a lot.

Do you use different definitions of sequential or random?!?

A file is divided into clusters, and there is no guarantee that they are continuous on the disk.

True, I am not going to implement a disk defragger into the program to speed things up.
When writing these files, I again do large sequential writes, with a buffer set to the file size.
Unless the disk is nearly full, there will be contiguos space for the data. I did all of that
optimization weeks ago... Sequential wins again for throughput..
No, I don't see how that is relevant. You don't have I/O problems, you have memory problems.

The reason we dont have IO problems is BECAUSE of the large memory streams.
They just magically go away due to the goodness of sequential IO. 3x perf boost.
That was all well and good. My run times dropped from 150 minutes to 50
minutes when using two threads. The Disk Q lengths dropped from about 4
to about 0.3.

But, since I am working on this prog, I have to run this slow damn thing
5 times a day. At 50 minutes each..

This again brings us neatly back to the original problem.

<STRONG EMPHASIS>
Since my Disk Q's are low, and my cpu is only at 60%, I should be
able to use more threads. 3 at least, maybe 4.

But when I do this, I GET OUT OF MEMORY errors, and have to block
the threads while waiting on RAM.

The memory problem is 1/3 as evil as the random IO problem.
So, sequential must be at least half of the solution.

If only there were a way to release the memory soon after
it is read, before the end of the file. The memory problem goes away.
The best that you could hope for is to reduce the memory usage by an average of 50%, and as you are fragmenting the
memory (by allocating large and releasing small) it will most likely be considerably less.

This would be an implementation issue, on how to deallocate, when etc.
Nothing to do with the algorithm..
=======================================================
So in summary: It all adds up
150 minutes with one thread sequential IO even with small buffers, no disk contention
50 minutes (3x perf boost) by using filesize memory stream buffers. BUT cpu still not full busy.
25 minutes (2x perf boost) by using resizeable buffers.

This may yield a 6x perf improvement if both ideas are used, unless YET
ANOTHER bottleneck appears.

Going from 50 minutes down to 25 minutes sounds freaking awesome.
This would save me 2 hours a day!

I have written a ForwardOnlyFileReader with a chunked buffer.
I will post it in 1 minute...
 
Why do I try to help people who don't want help?

Your helpful idea of using smaller buffers, with threads, yields a 3x decrease in performance.

I wanted help, not a 3x pile of performance hurt. Smaller buffers, hurt, period.

I posted a nearly working implementation of a
 
I try to help you, but you don't even try to listen. You have already
decided what you believe is the answer, and your reply to anything that
I say is only that it doesn't fit your already chosen answer.

You obviously doesn't want help, only code. If you are not capable of
writing code yourself, hire someone who can.
 
Back
Top