GZipStream

  • Thread starter Thread starter DR
  • Start date Start date
D

DR

Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped
file) then loading 50GB unzipped data? im using
System.IO.Compression.GZipStream and its not maxing out the cpu while
loading the gzip data! Im using the default buffer of the stream that i open
on the 20GB gzipped file and pass it into the GZipStream ctor. then
System.IO.Compression.GZipStream takes an hour! when just loading 50GB file
of data takes a few minutes!
 
DR said:
Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped
file) then loading 50GB unzipped data? im using
System.IO.Compression.GZipStream and its not maxing out the cpu while
loading the gzip data! Im using the default buffer of the stream that i
open on the 20GB gzipped file and pass it into the GZipStream ctor. then
System.IO.Compression.GZipStream takes an hour! when just loading 50GB
file of data takes a few minutes!

GZipStream is broken in many ways. To list a few:

https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=93636
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=94784

My suggestion is to vote on those on Microsoft Connect in hope they will get
fixed eventually, and use SharpZipLib meanwhile.
 
DR said:
Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped
file) then loading 50GB unzipped data? im using
System.IO.Compression.GZipStream and its not maxing out the cpu while
loading the gzip data! Im using the default buffer of the stream that i open
on the 20GB gzipped file and pass it into the GZipStream ctor. then
System.IO.Compression.GZipStream takes an hour! when just loading 50GB file
of data takes a few minutes!

Because loading compressed data is more complex than loading uncompressed
data. I don't know the internals of the zip algorithm, but I'm guessing the
amount of IO reading prevents choking the CPU.
 
Back
Top