Canon LiDE 30 File Size Mysteries

  • Thread starter Thread starter Ray
  • Start date Start date
R

Ray

Hi, all ... I am using a CanoScan LiDE30. I scan a page using a
button on the front of the scanner. I have set this button to scan at
400 dpi and save as PDF in Acrobat 6. (The settings for this button
allow a choice between Standard and High PDF compression; I have chosen
Standard.) This creates a PDF file whose size is 886 KB.

Now, using the same scanner and scanning the same page, I use Acrobat's
File > Create PDF > From Scanner option. If I choose the Adapt
Compression to Page Content option, at the highest quality setting,
again using a 400 dpi setting, I get a PDF of comparable quality whose
size is 1,469 KB.

First question: why is the latter nearly twice the size of the former?

Finally, I go back and use the same Create PDF from Scanner settings as
in the second paragraph, above, except this time I do not chose the
Adapt Compression option. This time, I get a PDF -- again, of similar
quality -- whose size is 24,530 KB.

Second question: why does the highest quality compression setting
produce a file that is only 6% of the size of the uncompressed
alternative? That is, is there some way to produce a less lossy scan
that would be, say, 20% of the size of the uncompressed file?

This second question is important to me because the supposedly
high-quality compression sometimes produces scans that are very much
inferior to the uncompressed alternatives. I would like to be able to
do scans that are somewhat less extremely compressed.

Thanks in advance for any ideas or suggestions.
 
Hi, all ... I am using a CanoScan LiDE30. I scan a page using a
button on the front of the scanner. I have set this button to scan at
400 dpi and save as PDF in Acrobat 6. (The settings for this button
allow a choice between Standard and High PDF compression; I have chosen
Standard.) This creates a PDF file whose size is 886 KB.

Now, using the same scanner and scanning the same page, I use Acrobat's
File > Create PDF > From Scanner option. If I choose the Adapt
Compression to Page Content option, at the highest quality setting,
again using a 400 dpi setting, I get a PDF of comparable quality whose
size is 1,469 KB.

First question: why is the latter nearly twice the size of the former?

Finally, I go back and use the same Create PDF from Scanner settings as
in the second paragraph, above, except this time I do not chose the
Adapt Compression option. This time, I get a PDF -- again, of similar
quality -- whose size is 24,530 KB.

Second question: why does the highest quality compression setting
produce a file that is only 6% of the size of the uncompressed
alternative? That is, is there some way to produce a less lossy scan
that would be, say, 20% of the size of the uncompressed file?

This second question is important to me because the supposedly
high-quality compression sometimes produces scans that are very much
inferior to the uncompressed alternatives. I would like to be able to
do scans that are somewhat less extremely compressed.


You have to specify the scan mode (color, grayscale, or line art),
or else we have no clue what you are doing.

The image size in pixels depends on the inches scanned and resolution at
which it is scanned. For example, scanning 8x10 inches at 300 dpi
creates 2400x3000 pixels, every time. The size in bytes is that number
of pixels multipled by 3 for RGB color, or x1 for grayscale, or divided
by 8 for line art. These scan modes make a huge difference to file
size.

Then compression can reduce the size in bytes considerably.

For color or for grayscale, JPG compression is used, and JPG has a
sliding Quality factor that can be almost anything. Tiny and awful
quality, like your 6% of full size, or larger and much better, perhaps
25% of full size. Most programs have a JPG Quality slider to allow
setting intermediate positions for JPG Quality.

If the scanned page is text only, all black or white, with no gray or
color, then use Line art scan mode (which is all black or white, no
gray), and then Acrobats maximum compression is G4, which is lossless
and high quality, very clear, but still a smaller file than grayscale or
color JPG could be.
 
You have to specify the scan mode (color, grayscale, or line art),
or else we have no clue what you are doing.

Most programs have a JPG Quality slider to allow
setting intermediate positions for JPG Quality.

Wayne -- thanks for your reply. What I'm really looking for is info
from someone familiar with the various options in Acrobat. As I say,
it looks to me like I've got it set to maximum quality, and still it's
doing this maximal crunch. So that's what I don't understand. There's
no line art option in the Acrobat menus I'm seeing. I get similar
results (with different numbers) regardless of scan mode.

Ray
 
Wayne -- thanks for your reply. What I'm really looking for is info
from someone familiar with the various options in Acrobat. As I say,
it looks to me like I've got it set to maximum quality, and still it's
doing this maximal crunch. So that's what I don't understand. There's
no line art option in the Acrobat menus I'm seeing. I get similar
results (with different numbers) regardless of scan mode.


This is of course an Acrobat question, so you might try the Acrobat
forum at http://www.adobe.com/support/forums

When scanning with Acrobat, the Acrobat menu starts your scanners driver
where you set the scan mode, color, grayscale, or line art, and also
scan resolution.

In the same box at Acrobat (Create PDF - From Scanner) that contains the
"Adapt compression to page content" (which means scan mode),
it has the JPG slider ranging from "Higher Compression" to "Higher
Quality". This slider only affects Color or Grayscale scan modes.
Higher Compression is a smaller file (bytes), but the price is Lower
Quality.

If the scan is line art mode, then that slider is ignored, not
applicable to line art. I think Acrobat always uses G4 compression for
line art, which is lossless, both very good and very small. But
Grayscale or color uses JPG, which is a tradeoff of size vs quality.

The exception is when you insert existing image files into Acrobat, then
it seems to leave their compression alone, but you can use menu File -
Reduce File Size to cause it to recompress them according to its
schemes.

You have not said your scan mode, but you did say:

"Now, using the same scanner and scanning the same page, I use Acrobat's
File > Create PDF > From Scanner option. If I choose the Adapt
Compression to Page Content option, at the highest quality setting,
again using a 400 dpi setting, I get a PDF of comparable quality whose
size is 1,469 KB."

I dont know your page size either, so let's assume 8.5x11 inches. At
400 dpi, that will create (8.5 inches x 400 dpi) x (11 inches x 400 dpi)
= 3400x4400 pixels, or 14.96 million pixels.

I hope it is line art, else 400 dpi seems excessive, certainly if you
have many pages. If line art, then 400 dpi sounds fine - high quality,
but fine.

This 3400x4400 pixel image size in bytes will be

Color: 14.96 x 3 = 44.88 million bytes
Grayscale: 14.96 x 1 = 14.96 million bytes
Line art: 14.96 / 8 = 1.87 million bytes.

So your 1.496KB may be uncompressed line art. Looking at the image will
tell you.. line art is all black or white, without any trace of gray.
In which case, the Acrobat JPG Quality slider has no effect at all for
line art mode (it only affects Color or Grayscale modes, where JPG
compression is used). However, I dont know how to get Acrobat to NOT
compress line art.

Degree of compression does depend on the page content, large areas of
blank space compress extremely well, areas of very fine detail (say
text) dont compress as well. It does depend on page content.

Or maybe your scan is grayscale mode. If it is grayscale (and if 8.5x11
inches) then 14.96 million bytes compressed to 1,469 KB is compression
to 1/10 size, which is not at all unusual for grayscale JPG documents.
That might be Acrobats notion of High Quality.

The above was my opinion until now, but I just ran a little test that
suggests Acrobat actually does something else now.

I just scanned a 8.5x11 inch page of typed text at 400 dpi in Acrobat
6.0.5. File sizes were:

Grayscale - Max Quality - 1838KB
Grayscale - 1/2 Quality - 1834KB
Line art - 34 KB

Then - applying the File - Reduce File Size menu
reduced the color and grayscale to about 550KB.
Line art was unchanged - 34 KB.

So, it does seem to me that the Acrobat JPG Quality slider is not
working any more, at least not here for me this time. I was not aware,
I normally use line art, which is night and day more appropriate for
text documents. 100 pages of grayscale or color PDF will be a totally
unmanagable file size.
 
Hi, all ... I am using a CanoScan LiDE30. I scan a page using a
This is of course an Acrobat question, so you might try the Acrobat
forum at http://www.adobe.com/support/forums

Good point. I will poke around there and see if I find anything.
Thanks for that.

I dont know your page size either, so let's assume 8.5x11 inches. At
400 dpi, that will create (8.5 inches x 400 dpi) x (11 inches x 400 dpi)
= 3400x4400 pixels, or 14.96 million pixels.

I hope it is line art, else 400 dpi seems excessive, certainly if you
have many pages. If line art, then 400 dpi sounds fine - high quality,
but fine.

In one test, I got better OCR results when I was OCRing a file of
higher resolution. There were fewer unrecognized words. I think I did
that test using OmniPage or some other scanning program; I think I have
assumed but have not tested that the outcome would be similar in
Acrobat.

I think Acrobat 6 will OCR no higher than 600 dpi B&W or 400 dpi
greyscale or color. So I scan into the high dpi, do the OCR (Document
Paper Capture, in Acrobat), and then shrink the file size (Advanced > PDF Optimizer > Enable Adaptive Compression). I can get 300 pages of text into maybe 10-20 MB.

I tend to use greyscale rather than B&W when scanning text, for two
reasons: (1) B&W makes a mess if it's a scan from a book -- it
converts the center fold to a big black blob that (with large books)
sometimes obscures the print, whereas the lettering still comes through
OK with greyscale, and (2) I'm finding that B&W sometimes obliterates
the fine points of lettering -- that it is easier to read something
scanned in greyscale.

If I really want to convert it to B&W, I may scan it as JPG from within
IrfanView and then do some batch conversions to rotate, crop, increase
contrast, and reduce gamma. Then I let Acrobat suck them all into a
massive PDF; then I compress the PDF as above.

This 3400x4400 pixel image size in bytes will be

Color: 14.96 x 3 = 44.88 million bytes
Grayscale: 14.96 x 1 = 14.96 million bytes
Line art: 14.96 / 8 = 1.87 million bytes.

So your 1.496KB may be uncompressed line art.

It's color. It's just that Acrobat seems to compress things by its
own logic, to which I am not yet entirely privy.

I just scanned a 8.5x11 inch page of typed text at 400 dpi in Acrobat
6.0.5. File sizes were:

Grayscale - Max Quality - 1838KB
Grayscale - 1/2 Quality - 1834KB
So, it does seem to me that the Acrobat JPG Quality slider is not
working any more, at least not here for me this time.

Yeah, it's funky.
 
I tend to use greyscale rather than B&W when scanning text, for two
reasons: (1) B&W makes a mess if it's a scan from a book -- it
converts the center fold to a big black blob that (with large books)
sometimes obscures the print, whereas the lettering still comes through
OK with greyscale, and (2) I'm finding that B&W sometimes obliterates
the fine points of lettering -- that it is easier to read something
scanned in greyscale.


I'm a fan of grayscale for OCR, but color or grayscale is formidable size for
a PDF file (of many pages).

In line art mode, you can use the scanner preview crop tool to eliminate those
black borders. And you can work miracles with the scanners threshold setting
to greatly enhance the line art text quality. It is an easy art to learn.
Plus the line art file is an order of magnitude smaller (Acrobats G4
compression), and generally vastly more clear to print too. But video viewing
may not be quite as good as grayscale (at the resampled much smaller video
size).

It's color. It's just that Acrobat seems to compress things by its
own logic, to which I am not yet entirely privy.

Then my mistaken guess was that at 1.4MB and 400 dpi and color, it must not be
close to full page size. No one could stand that much JPG compression. :)
Even full size 150 dpi color pages look bad at 1/2 megabyte per page.

The Acrobat JPG Quality slider used to work, but somewhere along the way, I
seem to have lost mine too. Try the Adobe Acrobat support forum.
 
Back
Top