Disk file write performance

  • Thread starter Thread starter Lee Gillie
  • Start date Start date
L

Lee Gillie

Seeking some hints from a real framework asynchronous I/O ru of gu.

I am trying to find the absolute fastest way to write binary data to
disk files in VB.NET.

I am working on an FTP server. In my development testing I have found
that I can receive about 60 mbits/sec over the socket data connection
using asynchronous reads, with the file writing disabled. When I try to
incorporate asynchronous disk writing with FileStream it bogs down to
about 4.5 mbits/sec. So I start with only three, but I allocate extra
data buffers as needed to keep feeding the socket reads as fast as the
system will take them, and obviously not reuse the buffers until the
asynchronous disk write completes, and I find my buffer pool grows to
about 250 (i.e. up to 250 pending asynchronous disk writes at any one
moment). I/O buffers are 1K each. File transfers are about 50 MB.

Thanks for any pointers

Best regards - Lee Gillie
 
Ok, the single BIGGEST thing I have found was to open the output disk
file with SHARE=NONE. This boosted general performance by a factor of
about 5. Additionally, I found fiddling with the buffer size on the open
affected performance a lot. I found a 64K buffer was about optimal.
Using smaller or larger buffers could affect performance by many as much
as 10 fold. It seems to be worth testing. Unfortunately, I'll bet the
performance profile may vary from hardware to hardware!?!?

Curiously, I could not find any difference in either reliability or
speed when specifying the ASYNCH param as TRUE or FALSE. Maybe that is
because I am protecting the FileStream BeginWrite calls with VB.NET
SYNCLOCK? The subject of the need for thread synchronization of streams
is all a bit voodoo to me. I have not seen it discussed in depth anywhere.

I am now getting about 22 MBIT / SEC total throughput from the socket to
the disk file now. I think I can live with that.

Also, I found that by finding the optimal throughput the number of
buffers needed dropped from about 250 to 80. The number of buffers I
allocate is directly caused by the backlog of pending asynchronous
writes to the disk file.

- Lee
 
Lee Gillie wrote:

I've not done the analysis with file writes, but I have for reads.
Having said that, the notes below would (in general) apply to writes as
well as to reads.

Win32 disk reads and writes always use async access. If you don't use an
OVERLAPPED structure for ReadFile/WriteFile then the function will block
until the operation completes, if you use OVERLAPPED then the function
will return as soon as the operation is initiated and you get the
notification through your OVERLAPPED structure or through the thread
bound to the controlling IO completion port. In effect, the underlying
file driver code is the same for both sync and async.

The FileStream async methods call ReadFile/WriteFile and bind the file
handle to a threadpool thread so that when the action is complete the
async callback method is called. As you mentioned the FileStream caches
data as well, so this gives you a couple of situations for a read
operation:

1) the data requested is in the cache buffer
2) there is not enough data in the cache buffer to satisfy the request.

In the first case, the data will be read from the cache during BeginRead
which will also call EndRead and then call the callback method
asynchonously before returning. In effect, the call becomes synchronous.
In the second case the output buffer is filled with the cache, and then
the remainder is filled with an async ReadFile, but not necessarily for
a multiple of the block size. Indeed, it just reads the number of bytes
requested minus those that were provided by the cache.

It seems to me that if you read *exactly* the cache size each time then
you'll get better performance (because you'll get a short circuit to the
native call to ReadFile and no excess will be cached). I suspect this is
why you found that a buffer of 64K works and that other buffer sizes
affect performance by so much.
Curiously, I could not find any difference in either reliability or
speed when specifying the ASYNCH param as TRUE or FALSE. Maybe that is
because I am protecting the FileStream BeginWrite calls with VB.NET
SYNCLOCK? The subject of the need for thread synchronization of
streams is all a bit voodoo to me. I have not seen it discussed in
depth anywhere.

I don't think this has any effect. Anyway, file handles can only be
bound to a single thread (remember, this happens for the native read to
the file through BeginRead), so you can only make one call to BeginRead
at a time for a particular FileStream object.

Richard
 
Back
Top