Asynchronously writing to file

  • Thread starter Thread starter nickdu
  • Start date Start date
N

nickdu

I'm curious how asynchronously writing to a file behaves, e.g. using
FileStream.BeginWrite()/EndWrite(). Is there a limit impossed on one
outstanding async IO?

I did a bit of testing comparing using synchronous writes v.s. using
asynchronous writes. I wasn't really looking for performance numbers but
more behavior. I expected the overall times of writing a certain amount of
data to a file to be the same, though I can see that doing this
asynchronously might be able to take advantage of combining multiple write
calls into a single write at some lower level.

What I was hoping to see is that the average duration of the
FileStream.BeginWrite() call to be significantly less than the average
duration of the FileStream.Write() call. This was not the case. The average
durations were roughly the same. Why? I can understand them being the same
if we're limited to one outstanding async IO. But I'm wondering if I might
be making some other mistake. Also, the max duration of the asynchronous
method was much bigger than the synchronous method, 3.6 seconds v.s. .6
seconds. The application both sit in a loop which executes 256K times
writing 4K buffers totaling 1GB.

Code snippets are both for both methods.

sync
====

Stream stm = File.Open(args[0], FileMode.Create,
FileAccess.ReadWrite, FileShare.None);
using(stm)
{
 
I'm curious how asynchronously writing to a file behaves, e.g. using
FileStream.BeginWrite()/EndWrite(). Is there a limit impossed on one
outstanding async IO?

I did a bit of testing comparing using synchronous writes v.s. using
asynchronous writes. I wasn't really looking for performance numbers but
more behavior. I expected the overall times of writing a certain amount
of
data to a file to be the same, though I can see that doing this
asynchronously might be able to take advantage of combining multiple
write
calls into a single write at some lower level.

What I was hoping to see is that the average duration of the
FileStream.BeginWrite() call to be significantly less than the average
duration of the FileStream.Write() call. This was not the case. The
average
durations were roughly the same. Why?

Probably because neither method call actually waits for the i/o operation
to complete. Even when you use the synchronous API, there's a fair amount
of buffering going on. So even the synchronous method can return quickly
if all it had to do was buffer some data and/or if the main overhead for
the method call is some user/kernel transition or something like that.

The fact is, your stated average duration for the method call of 6/10ths
of a second is HUGE for any method call, so whatever is taking so long is
clearly not whatever actual work is going on. Something else is causing
the basic method cost to be inflated by some large amount, and that's
hiding whatever real difference between the theoretical operations might
exist.
I can understand them being the same
if we're limited to one outstanding async IO. But I'm wondering if I
might
be making some other mistake. Also, the max duration of the asynchronous
method was much bigger than the synchronous method, 3.6 seconds v.s. .6
seconds. The application both sit in a loop which executes 256K times
writing 4K buffers totaling 1GB.

It's hard to say for sure what might be going on without a
concise-but-complete code example that reliably demonstrates the issue,
_and_ a detailed description of the system configuration, _and_ access to
an identically configured system. There's so much that can affect
performance of specific i/o methods.

That said, I wouldn't be surprised at all if the "max duration" difference
is due to garbage collection. In a real-world scenario, you would be
using a new buffer for each call to BeginWrite() (you didn't post a
concise-but-complete code example, so there's no way to know for sure how
your code uses it), but even if you're using the same buffer over and
over, each call to BeginWrite() still allocates other objects, and if
eventually the garbage collector has to get involved, that could create a
dramatically slower outlying data point for your performance measurements.

Beyond that, just as you surmised, I too would not expect asynchronous i/o
to provide a significant performance advantage, or even one at all, in the
general case. Assuming .NET/Windows can take advantage of asynchronous
i/o to optimize disk access, or if you do it yourself when accessing
multiple volumes simulatenously, there is the _potential_ for a
performance improvement. But there are so many other variables that async
i/o generally may not provide any benefit, or could even perform worse
(depending on what happens to the pattern of disk accesses).

Pete
 
The average wasn't 6/10th second. 6/10th of a second was the max duration of
a synchronous write while the max duration of an async write was 3.6 seconds.

The averages for both were in the 200-300 microsecond range.
--
Thanks,
Nick

(e-mail address removed)
remove "nospam" change community. to msn.com
 
The average wasn't 6/10th second. 6/10th of a second was the max
duration of
a synchronous write while the max duration of an async write was 3.6
seconds.

The averages for both were in the 200-300 microsecond range.

Ah, okay, I see. I misread. In that case, I think it's MUCH more likely
that you are simply seeing garbage collection overhead. The async API is
going to be a lot heavier on GC than the synchronous API.

If you're really curious, you can run your program with a profiler and see
exactly where it's spending its time.
 
Back
Top