Best common file format to use to create PDFs?

  • Thread starter Thread starter Zak
  • Start date Start date
Z

Zak

Zak said:
Afaik, GIF is not used as a graphical format inside PDF. It is
probably compressed as a "TIFF" with Cittfax or LZW compression.
"TIFF" is between brackets, because the embedded stream is also not
a complete TIFF file, just contains the compressed graphics and
some extra information like scansize, colordepth, color channels,
etc.

What happens under the hood in your PDF creation really depends on
the PDF engine you're using. Many engines actually resize your
graphics to match the PDF DPI resolution. If you're an experienced
programmer you could try to generate the PDF yourself, with the
images in full resolution. The PDF specification is open and can be
found on the Adobe website.

Nils


Hi Nils and others. I understand now that when I create a PDF from a
image file that the format of the image file is not used inside the
PDF. Instead some other format is used in the PDF (which Nils kindly
suggests may be a specialized form of TIFF).

It is this conversion from my image file format to the internal PDF
format which I want to be done smoothly. I am on XP and I am
wondering if it is better to start with a GIF or a JPG or BMP or
whatever to feed into my PDF creation utility.

I should say that I am starting with a hard copy of a document
created on a word processor. I want to avoid artefacts, unecessarily
jagged lines, moire effects and all that stuff which might come from
transforming from an "awkward2 image format to a PDF.

My PDFs will be for public distribution. I have preferred to scan to
a GIF file rather than a TIFF because I have assumed that when I
circulate the basic image file among certain people that the best
balance between image size and the best chance of them being able to
see the file is a GIF.

To me TIFF feels a bit specialized. For example, I never see a web
page with TIFF images but I see lots of pages with GIFs.

Also there seem to be various compression options for a TIFF (group 3
or 4, LZW, JPEG deflate, none) which might makes it even harder for
me to know what to choose as a common format! The Wikipedia says
documents are often scanned to TIFF group 4 but is that something
which has the best chance of being seen on various PCs in various
organisations that I might need to send it to?
 
Zak said:
Hi Nils and others. I understand now that when I create a PDF from a
image file that the format of the image file is not used inside the
PDF. Instead some other format is used in the PDF (which Nils kindly
suggests may be a specialized form of TIFF).

It is this conversion from my image file format to the internal PDF
format which I want to be done smoothly. I am on XP and I am
wondering if it is better to start with a GIF or a JPG or BMP or
whatever to feed into my PDF creation utility.

I should say that I am starting with a hard copy of a document
created on a word processor. I want to avoid artefacts, unecessarily
jagged lines, moire effects and all that stuff which might come from
transforming from an "awkward2 image format to a PDF.

You can create a very clean PDF directly from a Microsoft Word Document
(.doc).
There are programs that act like a printer that creates a PDF, just by
"printing a PDF".

PDF Create! is one such program.

Just search Google for "microsoft word print pdf" without the quotes.
You will get lots of responses.
 
You can create a very clean PDF directly from a Microsoft Word
Document (.doc).
There are programs that act like a printer that creates a PDF, just
by "printing a PDF".

PDF Create! is one such program.

Just search Google for "microsoft word print pdf" without the
quotes. You will get lots of responses.


The documents are not written by me. They have been sent to me so they
are in hard copy form and need scanning.
 
Why not let a program like PDF-Tools take care of the problems for you -
this will scan direct to PDF for you without the intermediate image
process - all you need to do is make you decision's regarding
optimisation/compression.

you can try it here within the PDF-XChange PRO package (not standard or Lite
versions) - until licensed you will get demo watermarks in the top
right/left corner of each page which do add about 4kb to each page.

http://www.docu-track.com/downloads/users/

--
Best Regards

John Verbeeten
Tracker Software Products
PDF-XChange & SDK, Image-XChange SDK,
PDF-Tools & SDK, TIFF-XChange & SDK, DocuTrack.
Email : (e-mail address removed)
Support: http://www.docu-track.com/forum/index.php
Web site : http://www.docu-track.com
 
Zak said:
The documents are not written by me. They have been sent to me so they
are in hard copy form and need scanning.

If the documents are in single-sheet form and can be fed thru a
sheet-feed scanner, the fairly new Fujitsu "ScanSnap" can automatically
produce PDF output (or other formats, at user's option).

It's a bit pricey (circa $400) but it's a pretty nice unit, small, fast,
easy to use, can do both sides at once, auto-select for B&W or color,
and so on.
 
["Followup-To:" header set to comp.periphs.scanners.]

Zak is obviously not a programmer, let alone an experienced one. Using
PDFlib from C isn't that difficult if you have some experience in C,
though.

And creating a PDF using only the spec would take a bunch of experienced
programmers a while. The PDF spec is really, really complex. Its
complexity is one reason why PDFlib and ps2pdf and OpenOffice's "print
to PDF" functionality exist.
I understand now that when I create a PDF from a image file that the
format of the image file is not used inside the PDF. Instead some
other format is used in the PDF (which Nils kindly suggests may be a
specialized form of TIFF).

Using tiff2ps -> ps2pdf says that a grayscale TIFF ends up converted to
a stream object that can be decoded by the FlateDecode filter.
YPDFEngineMV, obviously.
It is this conversion from my image file format to the internal PDF
format which I want to be done smoothly. I am wondering if it is
better to start with a GIF or a JPG or BMP or whatever to feed into my
PDF creation utility.

Depends on what you want. Get a good scan, and convert it to
black-and-white if you can do that without losing important info;
that'll make the PDF smaller. JPEG may introduce artifacts, so you
probably don't want to use that. TIFF G4 and TIFF LZW are lossless, so
you may want to use those.
I should say that I am starting with a hard copy of a document created
on a word processor.

Yuck. The original WordPerfect or whatever file would've been a much
better place to start from. PDFs with just text in them tend to be
smaller, display faster, and can look good at any zoom level. PDFs made
from images take a longer time to display, are larger, and look terrible
at high zoom levels.
I have preferred to scan to a GIF file rather than a TIFF because I
have assumed that when I circulate the basic image file among certain
people that the best balance between image size and the best chance of
them being able to see the file is a GIF.

? You're creating a PDF, not distributing a series of image files.
To me TIFF feels a bit specialized. For example, I never see a web
page with TIFF images but I see lots of pages with GIFs.

This is because of Hysterical Raisins in the history of web browsers,
and because of Unisys being asses. JPEG compresses better than TIFF-LZW
for lossy color images, and smaller images are preferred, especially when
you're on dialup. TIFF-LZW gives the best lossless compression for
color images, but TIFF-LZW is usually used where losslessness is more
important than file size (like in prepress.) Also, Unisys said they'd
sue anyone who made a TIFF-LZW compressor unless they paid Unisys a
license fee.[0] These things combined made it so that the earliest GUI
browsers didn't support viewing TIFFs, just JPEGs and GIFs. And this
has persisted to the present day... even though TIFF-G4 compresses
better than *anything* else, and does so losslessly, iff your image is
black-and-white.
Also there seem to be various compression options for a TIFF (group 3
or 4, LZW, JPEG deflate, none) which might makes it even harder for me
to know what to choose as a common format! The Wikipedia says
documents are often scanned to TIFF group 4 but is that something
which has the best chance of being seen on various PCs in various
organisations that I might need to send it to?

....what? If somebody can't figure out how to view a Group4 TIFF,
they're probably computer-illiterate. Anyway, aren't you making a PDF
here? It doesn't matter what the original image format was if it's been
PDFed. Acrobrat Reader can decode the image data within a PDF, as long
as the PDF library/PDF writer/whatever that created that PDF wasn't
smoking crack. Anyway, HTH,

[0] Fortunately, their patent (on a *mathematical method*!) expired a
couple of years ago, so all the Free stuff can write LZW now, which is a
win for everybody.
 
Never use JPEG for this purpose. GIF and BMP are not the normal
choice.
Yes. The image file format isn't stored in the PDF.
Absolutely not JPEG. BMP has no advantage over TIFF and GIF has
disadvantage.

I don't really follow your question. since GIF and TIFF use lossless
compression, then preserve quality and avoid artefacts and
interference patterns, by definition.

You may have the choice of whether to use lossless compression, or
not, in making the PDF.
My PDFs will be for public distribution. I have preferred to scan to
a GIF file rather than a TIFF because I have assumed that when I
circulate the basic image file among certain people that the best
balance between image size and the best chance of them being able to
see the file is a GIF.

If you are distributing the image file, that may be true. If you are
preparing the PDF file from the image file, it is not relevant at all.
To me TIFF feels a bit specialized. For example, I never see a web
page with TIFF images but I see lots of pages with GIFs.

That's because web browsers can display GIF and JPEG images as
standard, so web graphics are in those formats. That doesn't make them
in any sense "best".

TIFF is the industry standard format for document scanning, by a very
wide margin.
Also there seem to be various compression options for a TIFF (group 3
or 4, LZW, JPEG deflate, none) which might makes it even harder for
me to know what to choose as a common format!

These options are not relevant. The PDF file doesn't include the TIFF
information, only the image from the TIFF file.
 
As far as I can see there really is no best common file format to
convert. If it'll convert it'll work. However the size of the
original file will have a direct bearing on the size of the pdf.

If you are doing something like creating a newsletter, flyer, or
Internet distribution then why not use the original doc file?

I handle several newsletters on line and in print.
With Adobe pro any Office and I believe Word Perfect doc can be
converted directly to a pdf. However any images in the documents
should be of the proper size and resolution for the end media. I've
had Word docs sent to me that had the full original images with just
the physical dimensions set. They were still the original one or two
meg images set to a dimension of 2 X 3 inches. These produced nice
looking pdfs, but of many megabytes. Having the images set to the
proper resolution (300 ppi for print and about 100 ppi for screen)
dropped the pdf to less than 100K.

Also not all pdf creators are created equal. About a year ago I tried
using open office to convert a word doc and produced one that was
about 3 to 4 times the size of one using Adobe Pro. This is fine for
printed media, but may (or may not) be a royal pain in the back side
for on-line viewing.

For on-line I much prefer HTML rather than pdfs as the HTML will be
faster to load and more compact. At least it will if it wasn't created
by saving a Word doc as HTML or using Front Page to create it. Those
are huge. OTOH converting to a pdf is faster and much easier and I do
use them when the pdfs are relatively small.

Roger Halstead (K8RI & ARRL life member)
(N833R, S# CD-2 Worlds oldest Debonair)
www.rogerhalstead.com
 
Roger said:
As far as I can see there really is no best common file format to
convert. If it'll convert it'll work. However the size of the
original file will have a direct bearing on the size of the pdf.

No. The size of the original will usually have no effect whatsoever,
though some PDF creation methods are influenced by it.
With Adobe pro any Office and I believe Word Perfect doc can be
converted directly to a pdf.

With Acrobat Pro or Acrobat Standard, any file you can print can be
converted directly to a PDF.
. Having the images set to the
proper resolution (300 ppi for print and about 100 ppi for screen)
dropped the pdf to less than 100K.

Or, you could use Acrobat options to reduce the resolution.
 
Zak said:
I should say that I am starting with a hard copy of a document
created on a word processor.

Why don't you just start with the word processor file, and not with a
hardcopy at all? Go straight from the word processor file to PDF.
I want to avoid artefacts, unecessarily
jagged lines, moire effects and all that stuff which might come from
transforming from an "awkward2 image format to a PDF.

My PDFs will be for public distribution. I have preferred to scan to
a GIF file rather than a TIFF because I have assumed that when I
circulate the basic image file among certain people that the best
balance between image size and the best chance of them being able to
see the file is a GIF.

A GIF is almost the worst possible choice to use, because GIF images
contain a very small number of colors, and because of this they don't
tend to downsample smoothly.

Use TIFF. Anything that can read a PDF, can read a PDF, period. It does
not matter what you start with; once it is turned into a PDF, it is a
PDF. However, a TIFF image will downsample and compress smoothly.
To me TIFF feels a bit specialized. For example, I never see a web
page with TIFF images but I see lots of pages with GIFs.

That doesn't mean a GIF is the best format to use for general purposes,
however.
Also there seem to be various compression options for a TIFF (group 3
or 4, LZW, JPEG deflate, none) which might makes it even harder for
me to know what to choose as a common format!

You do not need to choose any of these. You do not need to compress the
TIFF at all.

Scan a TIFF, make a PDF, send out the PDF, you're done. Or, better yet,
do not use your scanner at all. Start with the word processor file, make
a PDF--it'll be smaller and far higher quality. :)
 
Trimmed to context

Zak is obviously not a programmer, let alone an experienced one.
Using PDFlib from C isn't that difficult if you have some
experience in C, though.

You're right about not being an experienced programmer! The last thing
I tried was something like COBOL.


Depends on what you want. Get a good scan, and convert it to
black-and-white if you can do that without losing important info;
that'll make the PDF smaller. JPEG may introduce artifacts, so you
probably don't want to use that. TIFF G4 and TIFF LZW are
lossless, so you may want to use those.

Now that is getting much closer to a firm recommendation as to what to
use: TIFF group IV or TIFF LZW. Seems like the jpeg may not be so
suitable.
Yuck. The original WordPerfect or whatever file would've been a
much better place to start from.

Yes, I know what you meanbut the choice is not mine. These are
documents which have been sent to me. Some documents are of my reply to
other people and I do have the original word processor file for that.
PDFs with just text in them tend
to be smaller, display faster, and can look good at any zoom level.
PDFs made from images take a longer time to display, are larger,
and look terrible at high zoom levels.

I have got to live with this compromise. In most cases I will have to
use an image of the original hard copy.

? You're creating a PDF, not distributing a series of image files.

Until the PDF is finalised and all the security is set, the contents
will be in their raw state. So it will be a GIF, TIFF or whatever as it
gets adjusted or edited.


-- snip --- this has persisted to the
present day... even though TIFF-G4 compresses better than
*anything* else, and does so losslessly, iff your image is
black-and-white.

You make some quite fascinating, very interesting and relevant points
about the history of formats and why incompatibilities may still exist
today. I am persuaded by that alone to start using TIFF Group IV for
mono.

What if the original is in color? Maybe it is a line drawing or a motif
or (more rarely) shading of a area. Is TIFF group IV such an obvious
choice then? What is the alternative?

documents are often scanned to TIFF group 4 but is

...what? If somebody can't figure out how to view a Group4 TIFF,
they're probably computer-illiterate.

I fully agree with you. But I have to operate within *their*
limitations. I sent some documents to a local councillor who said they
couldn't work out how to see them. They must be a hopeless case but I
still need to do what I can to make sure they do get to see my document.
As you say I am making PDFs now but that was my experience before I
decided that PDF removed the variables from how the image was or was not
seen.

Anyway, aren't you making a
PDF here? It doesn't matter what the original image format was if
it's been PDFed. Acrobrat Reader can decode the image data within
a PDF, as long as the PDF library/PDF writer/whatever that created
that PDF wasn't smoking crack. Anyway, HTH,

HTH: Yes, very much indeed. I am grateful. Thank you.
 
Snipped and trimmed to context.

Never use JPEG for this purpose. GIF and BMP are not the normal
choice.

Absolutely not JPEG. BMP has no advantage over TIFF and GIF has
disadvantage.
I don't really follow your question. since GIF and TIFF use lossless
compression, then preserve quality and avoid artefacts and
interference patterns, by definition.
You may have the choice of whether to use lossless compression, or
not, in making the PDF.

I think you are "echoing" what I have just been reading from Dances-
With-Crows.

I should explain the artifacts notion i was asking about. If I scan to
a GIF which I understand is lossless, then it still has a certain
number of "lines" and a certain block size or whatever it is that is
inside a GIF. If these blocks and lines do not match up with those used
by the image it is converted to inside the PDF then there may be
additional irregularities introduced at those places of mismatch.

It's a bit like memory and a system bus on a motherboard. If they are
both 100 MHz then they sing in harmony. If the memory is 133 MHz (and
can not fall back to 100 MHz) then they may give a slightly "off"
performance.
That's because web browsers can display GIF and JPEG images as
standard, so web graphics are in those formats. That doesn't make them
in any sense "best".

TIFF is the industry standard format for document scanning, by a very
wide margin.

These options are not relevant. The PDF file doesn't include the TIFF
information, only the image from the TIFF file.

Are you not saying that it is important to choose the internal image
inside the TIFF correctly? I think you are. Then I guess you would
concur with Dances-With-Crows about using Group IV. Remember that I do
want the option of sending the raw image to colleagues (rather than the
shrink-wrapped and sealed PDF).

Thank you for any extra info.
 
Zak said:
Hi Nils and others. I understand now that when I create a PDF from a
image file that the format of the image file is not used inside the
PDF. Instead some other format is used in the PDF (which Nils kindly
suggests may be a specialized form of TIFF).

It is this conversion from my image file format to the internal PDF
format which I want to be done smoothly. I am on XP and I am
wondering if it is better to start with a GIF or a JPG or BMP or
whatever to feed into my PDF creation utility.

I should say that I am starting with a hard copy of a document
created on a word processor. I want to avoid artefacts, unecessarily
jagged lines, moire effects and all that stuff which might come from
transforming from an "awkward2 image format to a PDF.

My PDFs will be for public distribution. I have preferred to scan to
a GIF file rather than a TIFF because I have assumed that when I
circulate the basic image file among certain people that the best
balance between image size and the best chance of them being able to
see the file is a GIF.

To me TIFF feels a bit specialized. For example, I never see a web
page with TIFF images but I see lots of pages with GIFs.

Also there seem to be various compression options for a TIFF (group 3
or 4, LZW, JPEG deflate, none) which might makes it even harder for
me to know what to choose as a common format! The Wikipedia says
documents are often scanned to TIFF group 4 but is that something
which has the best chance of being seen on various PCs in various
organisations that I might need to send it to?

When i creat a document that is going to be a PDF I always use TIF
files, mainly because Indesign handles TIF files well. I generally use
LZW compression on my TIFs, seems to make no difference.
Once the PDF is created the image files, in my understanding are
converted to Jpeg files, at least that is how they can be extracted.
With the file already being downsampled for the web it is very unlikely
you will see jpeg artifacts coming from an orginal TIF. Multiple
compressions or resampling from a jpeg is another story.
Working from graphics or drawings GIF may be applicable, but for
photographs GIFs should be avoided.

Tom
 
However the size of the
original file will have a direct bearing on the size of the pdf.

If you are doing something like creating a newsletter, flyer, or
Internet distribution then why not use the original doc file?

Unfortunately, some of the documents have been sent to me in hard copy
form.

Also not all pdf creators are created equal. About a year ago
I tried using open office to convert a word doc and produced
one that was about 3 to 4 times the size of one using Adobe
Pro. This is fine for printed media, but may (or may not) be
a royal pain in the back side for on-line viewing.

Yes, I feared that once I had mastered the basics then my next task is
to identify is my PDF creator is doing as good a job as I might want it
to.
 
A GIF is almost the worst possible choice to use, because GIF
images contain a very small number of colors, and because of this
they don't tend to downsample smoothly.

DOWNSAMPLE. That's the word! I ahve just written one if not two
paragraphs trying to explain what I man and then you come along and
express the idea in a single word!

Use TIFF. Anything that can read a PDF, can read a PDF, period. It
does not matter what you start with; once it is turned into a PDF,
it is a PDF. However, a TIFF image will downsample and compress
smoothly.


That doesn't mean a GIF is the best format to use for general
purposes, however.

OK, so TIFF it is going to be. And to swagger my newly gained
knowledge I will add that it might be group 4 or LZW (and I nod very
slowly as if I know what I am talking about - which I don't).
 
Then I guess you would
concur with Dances-With-Crows about using Group IV. Remember that
I do want the option of sending the raw image to colleagues (rather
than the shrink-wrapped and sealed PDF).


Can I add about an additional point to do with TIFFs.

When I go into Acdsee and launch Twain, I am asked what format I want
to scan to.

I say TIFF and then I have an option where I can select Group 4. I
am also asked to fill in the dpi value horizontally and vertically.
I don't get asked this when I choose to scan to GIF or to JPEG.

When I get into the actual Twain screen I choose the scanning
resolution as usual.

So, what values should go into those horizontal and vertical boxes
for TIF? Do I need to put in the same value as I use for Twain's
scanning resoultion? (This can be awkward.)

The software is slow to load and if I put in 200 for these TIF value
and scan at 266 or 300 then does that lead to problems or loss of
quality?

I have tried 200, 300 and 600 in the H & V boxes (at scanning
resolutions of 200, 240, 266, 300) and the 200, 300 or 600 seems to
make no difference at all to the final size.

I will have to look closely to see the quality.

Can you or annyone else comment on this extra pair of values.
 
Zak said:
I say TIFF and then I have an option where I can select Group 4. I
am also asked to fill in the dpi value horizontally and vertically.
I don't get asked this when I choose to scan to GIF or to JPEG.

Group 4 compression is the compression used by FAX machines. When you
send a FAX, the vertical and horizontal resolutions are different; FAX
machines use pixels that are not square.

TIFF supports Group 4 primarily to facilitate software that receives
FAXes on a computer, or computer programs designed to make scans and
then send FAXes. Since that's not what you're doing, there's no reason
to use CCITT Group 3 or Group 4 compression (which really only works
well on simple bitmaps anyway).
 
["Followup-To:" header set to comp.periphs.scanners.]
Group 4 compression is the compression used by FAX machines.

Fax machines use Group3, not Group4. Group3 is less efficient than
Group4.
send a FAX, the vertical and horizontal resolutions are different; FAX
machines use pixels that are not square.

TIFF has supported having different horizontal and vertical resolutions
since the format started up; this is not a fax-specific thing. Not many
people use this TIFF capability, and some programs will barf if they
read different values for TIFFTAG_XRESOLUTION and TIFFTAG_YRESOLUTION,
but it's in the TIFF spec.
then send FAXes. Since that's not what you're doing, there's no reason
to use CCITT Group 3 or Group 4 compression (which really only works
well on simple bitmaps anyway).

Group4 is A) lossless B) more efficient than any other compression
method for bilevel data. These qualities make Group4 an excellent
choice for storing black-and-white images. Zak was scanning documents
that consisted mostly of text, which is typically very high-contrast and
works really well in black-and-white. So every page with just text (no
graphics) on it could easily be turned into a Group4 TIFF with no loss
of data. HTH,
 
Group4 is A) lossless B) more efficient than any other compression
method for bilevel data. These qualities make Group4 an excellent
choice for storing black-and-white images. Zak was scanning
documents that consisted mostly of text, which is typically very
high-contrast and works really well in black-and-white. So every
page with just text (no graphics) on it could easily be turned into
a Group4 TIFF with no loss of data. HTH,


Thanks there.

I was asking what format I might consider using for colour documents.

Some of these color documents might be mainly line drawings and some
others might have areas of color.
 
Zak said:
Thanks there.

I was asking what format I might consider using for colour documents.

Some of these color documents might be mainly line drawings and some
others might have areas of color.

Tiff is still great. Just use No compression or LZW compression.
Group 3 and Group 4 compression is for Black and White images only.
 
Back
Top