I/O stream performance

  • Thread starter Thread starter Guoqi Zheng
  • Start date Start date
G

Guoqi Zheng

Dear Sir,

I have many small XML files contain all kind of data, very often, I need to
get a summary of the XML file and show part in a repeater control...

For example... I have 10 xml files in one folde.. 1.xml, 2.xml, 3.xml,
....... 10.xml.

Every XML file contains a summary tag and a detail list. for example..
******************************************
<xml>
<summary>
there are some text here
</summary>
<details>
<detail id=2>some contentssfasfsd. sfsafs.s fsdfsdf
</detail>
<detail id=5>dsfdsfsd </detail>
</details>
*******************************************

I can read the XML into a dataset, then get the first summary part. However,
this will cause a lot of I/O overhead. What should I to only read the first
part (<summary>....</summary> of those XML files?

I actually need to all those files (10 XML file in this case), get the
summary part of every file, add it to a arraylist and bind to a repeater
control.

Thanks in advanced...

Guoqi Zheng
 
Guoqi,

If the files are very small, under 512 bytes, then what you are asking is
impossible. The operating system reads/writes files in blocks of 512 (and at
the actual device driver/file system level it is probably dealing with 4K
chunks... essentially the allocation size for the file system).

If the files are larger than 512 bytes you might be able to save some IO
cycles... but unlikely. More or less open the file read 512 bytes, then try
and match on <summary>*</summary> (maybe using RegEx).

But... it looks like a database may be more appropriate structure (exactly
how many is "I have many small xml files"... if it is several hundred...
well... that sounds like a database is needed).

Or perhaps at least some sort of cache so the files are in memory (but if
you have lots and lots of these then this is not likely a good answer
either).

Regards,

Rob
 
Rob,

Thanks a lot for your answer...

I have millions, millions of those small text file... The size normally
about 4kb-10kb, some times it can be up to 100kb... My pc always format
using the 512 bytes allocation.

Do you think I gain some performance by using readbyte and convert to
string, check the <summary> tag first?

Thanks a lot...
 
Back
Top