Very slow append rewrites file intermittently, why?

  • Thread starter Thread starter chrisp
  • Start date Start date
C

chrisp

Hi, I have an application that appends to 40mb files, intermittently
this append operation takes a long time (8-30 seconds) to append 300k
of data.

When watching the application with task manager I notice that the
application writes 40mb of data when this occurs. (the size of the
file I'm appending to)

To ensure I wasn't writing 40mb of data I wrote a test program.
(attached below) which opens a file in binary append mode, writes some
data, and closes it.

Now here's where it gets weird.

If I run the test program, it takes less than a second, then I reboot
my laptop and try again, the first time I run it on the test file it
takes 8 seconds, and in task manager, it shows it wrote 40mb, the
second time, it takes 0 seconds and writes 300k

If I run the same test program, on another copy of the same file, it
works fine. Even if I reboot first.

I have defragemented my drive and uninstalled norton anti virus.

Any one got any idea what's going on ? Why is the operating system
writing 40mb of data some times for some files?

Code and log attached.

Running on windows xp professional 2002, sp 2, on an HP laptop.
-------------
Directory of C:\

05/10/2006 09:48 a.m. 40,655,238 Done.data
05/10/2006 09:47 a.m. 40,055,238 done.data3

C:\src\testadd\Debug>testadd \done.data3

Done it 0 seconds


C:\src\testadd\Debug>testadd \done.data

Done it 8 seconds


C:\src\testadd\Debug>testadd \done.data

Done it 0 seconds


-----------------------test program
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>

int main(int argc, char *argv[])
{
FILE *f;
char bf[30003];
int start = time(NULL);
printf("Press return to start:"); gets(bf);
start = time(NULL);
f = fopen(argv[1],"ab");
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fclose(f);
printf("Done it %d\n",time(NULL)-start);
printf("Press return to finish:");
gets(bf);
return 0;

}
 
chrisp@netwin... said:
Hi, I have an application that appends to 40mb files, intermittently
this append operation takes a long time (8-30 seconds) to append 300k
of data.

When watching the application with task manager I notice that the
application writes 40mb of data when this occurs. (the size of the
file I'm appending to)

To ensure I wasn't writing 40mb of data I wrote a test program.
(attached below) which opens a file in binary append mode, writes some
data, and closes it.

I tried the test app and always saw 20 I/O writes and 300,000 bytes.
The 20 I/O writes is a small puzzle. In testing there were always
two writes logged per call to fwrite(), even when writing 10 megabytes
at a time plus with large buffer sizes the bytes-written counter

My first thought when reading your post was that you were not opening
the file in binary mode and something in the C-library was rewriting
things.
However, you say you are using binary append mode and the sample
code does this.

While your test app looks fine try coding it using CreateFile(),
SetFilePointer(),
and WriteFile() to see if the problem is with the C library or the
operating system.

Marc
 
I tried the test app and always saw 20 I/O writes and 300,000 bytes.
The 20 I/O writes is a small puzzle. In testing there were always
While your test app looks fine try coding it using CreateFile(),
SetFilePointer(),
and WriteFile() to see if the problem is with the C library or the
operating system.

Ok, tried that, results were the same, interestingly the delay was
during the write not the seek.

Also, here are some stranger facts. The delay seems tied to the test
file, so if I copy the test file, and run the same test, I only get the
10 seconds delay (and extra IO's) on the original file, not the copy.
e.g.

c:> copy \done.data \done.data_copy
REBOOT.

C:\src\testadd\Debug>testadd \done.data_copy
Press return to start:
seek to end
writing
Close
Done it 0 seconds
Press return to finish:

C:\src\testadd\Debug>testadd \done.data
Press return to start:
seek to end
writing
Close
Done it 9 seconds
Press return to finish:

C:\src\testadd\Debug>

My first thought is that the file was 'fragmented' but running a
defragment didn't help, and I don't see why a fragmented file should
cause this strange 'copy before append' action once every reboot. And
even if it was fragmented and it was copying it first to defragment it,
then why would it happen again after the next reboot...

My other theory is that the disk is faulty, but I don't see why the
operating system would
copy the file, and if it did why does the problem persist....

This must be a feature/bug in the windows file system, but it's got me
totally stumped.

ChrisP.

(new sample code below)

-----------
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
#include <windows.h>

int main(int argc, char *argv[])
{
char bf[30003];
int start = time(NULL);
HANDLE h;
int n;
printf("Press return to start:"); gets(bf);

start = time(NULL);

h = CreateFile(argv[1],GENERIC_WRITE,0,NULL,OPEN_EXISTING,0,NULL);
printf("seek to end\n");
SetFilePointer(h,0,NULL,FILE_END);
printf("writing\n");
WriteFile(h,bf,30000,&n,NULL);
WriteFile(h,bf,30000,&n,NULL);
WriteFile(h,bf,30000,&n,NULL);
WriteFile(h,bf,30000,&n,NULL);
WriteFile(h,bf,30000,&n,NULL);
WriteFile(h,bf,30000,&n,NULL);
WriteFile(h,bf,30000,&n,NULL);
WriteFile(h,bf,30000,&n,NULL);
WriteFile(h,bf,30000,&n,NULL);
WriteFile(h,bf,30000,&n,NULL);
printf("Close\n");
CloseHandle(h);

printf("Done it %d seconds\n",time(NULL)-start);
printf("Press return to finish:");
gets(bf);

}
 
chrisp@netwin said:
Ok, tried that, results were the same, interestingly the delay was
during the write not the seek.

I don't think SetFilePointer() does anything other than set an
in-memory
variable. The actual positioning gets done when you try to write.

BTW - what C compiler are you using? I'm using Microsoft C++ v. 6.0,
created a new console app called testadd, and copy/pasted your code
into it. There were compile errors because "n" was not an unsigned
long
and that main() did not return a value. It was easy enough to fix
those
so I can run your code but it also made me aware that you must not be
using Microsoft Visual C++.

Are you by chance using sparse files? For example, I modified your
test
app to support a second argument after the file name where I could
specify
the size of a test file. In your initial post I saw that your test
file
is c:\Done.data and that it's 40,655,238 bytes long. I ran

debug\testadd \Done.data 40655238

This created a file that's reported as 40,655,238 bytes though with
NTFS
no file data was written to disk. I then ran "debug\testadd
\Done.data"
which appended 300,000 bytes and from the amount of time it took I
could
tell that Windows was first writing out 40,655,238 bytes of zeroes and
then it appended by data. I created a 1 billion byte test file and
appending
300,000 more bytes took 46 seconds (21.7 megabytes/second). However,
task manager only shows 300,000 bytes written and never reported the
extra 1 billion bytes written on my behalf.

I also tried a 1 megabyte test file on a floppy and as FAT12 does not
support sparse files the first part to create the file took something
like 12 seconds and when I ran it again to append 300K bytes it needed
4 seconds.

Is there any particular reason on why you have ten separate writes of
30K bytes each? Do you still see the same issue if it's a single
WriteFile(h,bf,1,&n,NULL)? (appending one byte to your file).

Is your disk formatted with FAT32 or NTFS?

This must be a feature/bug in the windows file system, but it's got me
totally stumped.

I agree, it's mystifying. The last time I got mystified it did turn out
to
be a windows bug - see http://marc.kupper.googlepages.com/scandf

Do you have the option of testing on other computers and/or testing
over a LAN to your computer? I know you are (or were) running Norton
but it's possible your system is infected by a virus that's rewriting
your
file to conceal something. Are you willing to install something like
SysInternal's diskmon
http://www.sysinternals.com/Utilities/Diskmon.html
or filemon http://www.sysinternals.com/Utilities/Filemon.html ?

Below is the code I added to create a sparse test file. I probably
should
do a version that sequentially writes out ulFileSize bytes rather than
creating a sparse file but in my previous testing with manually created
test files my testapp never wrote more than whatever was coded.

Marc

unsigned long ulFileSize;
DWORD dwFilePos;

if (argv[2] && ((ulFileSize = atol (argv[2])) != 0))
{
h = CreateFile(argv[1],GENERIC_WRITE,0,NULL,OPEN_ALWAYS,0,NULL);
if (h != INVALID_HANDLE_VALUE)
{
dwFilePos = SetFilePointer(h,ulFileSize,NULL,FILE_BEGIN);
if (dwFilePos != ulFileSize)
printf ("Error, SetFilePointer() returned %lu, expected %lu for
%s", dwFilePos, ulFileSize, argv[1]);
if (!SetEndOfFile (h))
printf ("Error %u from SetEndOfFile() for %s", GetLastError (),
argv[1]);
if (!CloseHandle (h))
printf ("Error %u from CloseHandle() for %s", GetLastError (),
argv[1]);
}
else
{
printf ("Error %u from CreateFile() for %s", GetLastError (),
argv[1]);
}
 
What exactly is in there in the 40MB file generated by your application/test
in the first shot. Has it written any actual non-zero data to this file or
has merely used SetFilePointer/SetEndOfFile call to make it a 40 MB file?

For optimization NTFS will not write out zeros in the extra space created in
the file as a result of SetFilePointer/SetEndOfFile calls. Your first call
to extend the file(writing at the end) will result in zeros written in that
area. The reason it doesnt happen the same on the copy of file is because
when "Copy" command copied the file it wrote down the zeros too (as read
calls for the original file would have resulted in zeros returned).

--
This posting is provided "AS IS" with no warranties, and confers no rights.
OR if you wish to include a script sample in your post please add "Use of
included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm"



Hi, I have an application that appends to 40mb files, intermittently
this append operation takes a long time (8-30 seconds) to append 300k
of data.

When watching the application with task manager I notice that the
application writes 40mb of data when this occurs. (the size of the
file I'm appending to)

To ensure I wasn't writing 40mb of data I wrote a test program.
(attached below) which opens a file in binary append mode, writes some
data, and closes it.

Now here's where it gets weird.

If I run the test program, it takes less than a second, then I reboot
my laptop and try again, the first time I run it on the test file it
takes 8 seconds, and in task manager, it shows it wrote 40mb, the
second time, it takes 0 seconds and writes 300k

If I run the same test program, on another copy of the same file, it
works fine. Even if I reboot first.

I have defragemented my drive and uninstalled norton anti virus.

Any one got any idea what's going on ? Why is the operating system
writing 40mb of data some times for some files?

Code and log attached.

Running on windows xp professional 2002, sp 2, on an HP laptop.
-------------
Directory of C:\

05/10/2006 09:48 a.m. 40,655,238 Done.data
05/10/2006 09:47 a.m. 40,055,238 done.data3

C:\src\testadd\Debug>testadd \done.data3

Done it 0 seconds


C:\src\testadd\Debug>testadd \done.data

Done it 8 seconds


C:\src\testadd\Debug>testadd \done.data

Done it 0 seconds


-----------------------test program
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>

int main(int argc, char *argv[])
{
FILE *f;
char bf[30003];
int start = time(NULL);
printf("Press return to start:"); gets(bf);
start = time(NULL);
f = fopen(argv[1],"ab");
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fwrite(bf,30000,1,f);
fclose(f);
printf("Done it %d\n",time(NULL)-start);
printf("Press return to finish:");
gets(bf);
return 0;

}
 
Back
Top