Huh? -Os is the best in most scenarios, I use it unless I have a
good reason to go tweaking optimization. Smaller = better use of
icache = (almost always) faster.
But -Os is not standard---that is, it's not the ubiquitous optimization
flag used. -O2 is a de facto standard in that its usage is pretty well
ubiquitous. It is at least the default for Debian and Debian-derived
distributions, and might also be for Red Hat/CentOS/Fedora, though I
don't have a system in that family to check at present.
Most distributions that I am aware of (Debian and Ubuntu, for sure, and
I know that Slackware and other distributions that I have used or tried
at various points) use -O2. There are _some_ packages that are built
with -Os, though it's been long enough since I have seen them that I no
longer remember what they are. So, using -Os would skew the result
from what a standard package manager would use to build the software,
which is what I was interested in.
Also, I don't see a major difference between -O2 and -Os:
-rwxr-xr-x 1 mbt mbt 93936 2009-04-21 16:49 alltray.-O2*
-rwxr-xr-x 1 mbt mbt 89840 2009-04-21 16:49 alltray.-Os*
The binary compiled with -Os is exactly 4KiB smaller, which doesn't
improve anything (at least on this system). On a system that uses 4KiB
pages, you save one page; on a system that uses 2MB pages, you save
nothing, and the cache is big enough that 4KiB isn't going to matter
anyway, I don't think. I don't have the time to step through the
generated assembly for the program, but being that the savings aren't
that great on modern hardware, maybe it wouldn't skew the results at
all. That said, it's still not ubiquitous, but comparison between
using it and the JIT may be interesting.
I suppose the real test would be to dig up a piece of real old hardware
and to tests on that, or really small embedded hardware. That said, I
don't think I have anything that fits the bill that has a JIT ported to
it (yet).
--- Mike