Stream.Read()/Write() v.s StreamReader/Writer using DeflateStream

  • Thread starter Thread starter nickdu
  • Start date Start date
N

nickdu

I'm working on compressing some data. I was surprised to find that the data
generated when I use Stream.Read()/Write() to write to the DeflateStream is
different from the data when I use
StreamReader.ReadToEnd()/StreamWriter.Write(string) to write to the
DeflateStream.

The data I'm compressing is an XML file (ASCII or ANSI encoding). Here is
the logic I use to compress using Stream.Write():

if ((bool) compress == true)
{
input = File.OpenRead(ifile);
try
{
Stream ostream = File.Create(ofile);
try
{
output = new DeflateStream(ostream, CompressionMode.Compress);
try
{
byte[] buffer = new byte[4096];
int bytes;

while ((bytes = input.Read(buffer, 0, buffer.Length)) != 0)
{
output.Write(buffer, 0, bytes);
}
}
finally
{
input.Close();
output.Close();
}
}
catch(Exception)
{
ostream.Close();
throw;
}
}
catch(Exception)
{
input.Close();
throw;
}
}

using StreamReader/StreamWriter I do:

if ((bool) comp == true)
{
Stream output = File.Create(ofile);
using(output)
{
Stream input = File.OpenRead(ifile);
using(input)
{
StreamReader reader = new StreamReader(input, Encoding.Default);
DeflateStream compress = new DeflateStream(output,
CompressionMode.Compress, true);
StreamWriter writer = new StreamWriter(compress, Encoding.Default);
string s = reader.ReadToEnd();
writer.Write(s);
writer.Flush();
}
}
}

The size of the output compressed files are quite different, 109,063 (using
Stream.Read()/Write()) v.s. 110,062 (using StreamReader/StreamWriter). I'm
not as concerned about this but would like to know why such a difference.
More importantly is that when I decompress I get two different results.
Decompressing the Stream.Read()/Write() version produces a file that matches
the original file. Decompressing the StreamReader/StreamWriter version
produces a file with one less byte than the original (though for some reason
fc.exe says the files are the same) and the one byte that's different is
caused by the decompressed file missing the last newline character.

--
Thanks,
Nick

(e-mail address removed)
remove "nospam" change community. to msn.com
 
Well the encoding is one thing I thought might be causing a problem.
However, after loading the string using the StreamReader I wrote it out to a
file using a StreamWriter() and compared that file against the original.
They matched exactly. This leads me to believe that it's not an encoding
problem. Instead I was wondering whether it's a buffering issue.
--
Thanks,
Nick

(e-mail address removed)
remove "nospam" change community. to msn.com
 
I do close the deflate stream. I'll have to try to come up with a more
concise repro of this.
--
Thanks,
Nick

(e-mail address removed)
remove "nospam" change community. to msn.com
 
I stand corrected. You're right, I seem to be missing a using (or finally)
on the compress stream in the StreamWriter/StreamReader case. Not sure if I
had one there originally as I've made so many mods to the code to dump the
state along the way in hopes to determine where the differences are
originating from.
--
Thanks,
Nick

(e-mail address removed)
remove "nospam" change community. to msn.com
 
Hello Nick and Peter,

Thanks for your posting in Microsoft Newsgroup. I am also doing research on
this. As far as I test, I agree that the second issue should result from
the not closing the compress stream, so that the decompress file does not
match the original one.

But as to the first issue, my test shows that it may be related to the
buffer length as Nick guessed. In the first snippet of codes, if the buffer
length is specified differently, compressing process in the while loop may
have different results. This makes sense because we compress the stream one
segment after another. If the segment size changes, the result compressed
file's size changes.

For a quick test, I use a 242180 bytes .xml file as input. A 4*1024 buffer
will produce a 27324 compressed file, and a 6*1024 buffer for 27259 bytes
file, 4*1024*1024 for 27253 bytes.

Actually, if we using reflector to see the disassembly codes of
StreamWriter.Write(String), we can easily find that, it is also written to
the base stream by segments. The segments size depends on two internal
field the charPos and charLen which is out of our control. I post the codes
as follows for your better reading,

public override void Write(string value)
{
if (value != null)
{
int length = value.Length;
int sourceIndex = 0;
while (length > 0)
{
if (this.charPos == this.charLen)
{
this.Flush(false, false);
}
int count = this.charLen - this.charPos;
if (count > length)
{
count = length;
}
value.CopyTo(sourceIndex, this.charBuffer, this.charPos, count);
this.charPos += count;
sourceIndex += count;
length -= count;
}
if (this.autoFlush)
{
this.Flush(true, false);
}
}
}
Both of the Stream.Read/Write and StreadWriter.Write should work, but just
using different size of segments. Therefore, their results are little
different from each other. Please let me know if this addresses your
concern. If you have any future questions, please let me know and I will
try my best to provide future support.


Best regards,
Colbert Zhou (colbertz @online.microsoft.com, remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).
 
Hello Nick,

I am reviewing this post. Would you mind letting me know if my explanation
addresses your questions and concerns? Any assistance you need from our
side, please let me know and I will follow up with my best effort.

Have a good day!

Best regards,
Colbert Zhou ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

This posting is provided "AS IS" with no warranties, and confers no rights.
 
Yes, your post answered my question. I changed the code to close the stream
and the one byte difference disappears, the files are exactly the same.
--
Thanks,
Nick

(e-mail address removed)
remove "nospam" change community. to msn.com
 
Back
Top