The compiler has to do all the work, this is no small task. Second, every
decent study of parallellism in regular programs indicates that 4 parallel
op's is about the best you can do, so of course Itanic started with 6
parallel operations. It was funny looking at the first Itanium compilers
output, it was almost always 1 real instrcution and 5 NOP's of course in
some cases this turned a 1 byte x86 instructuon into a 32 byte instructon!
For things that are parallel the CPU is ok, but the bottom line is that
after more than 40 years of research, most programs can not be made
parallel. As Dave Kuck an expert in parallellism said: "if you have
infinite parallellism and a program is 50% parallel you can double the speed
of the program".
I think I get what you're saying.
The parallellism in a ia64 is per thread. 1 thread can have several
instructions executing in parallel, but there can be only 1 active thread per
instruction core.
As a result, you can only parallellize simple localized instructions (like
different add instructions to independent variables.
The C / C++ language does not lend itself well to this, because the compiler
has a very hard time figuring out if the ordering of statements is required
or not.
With this in mind, I think the only real use for itanium at this moment is
to run special hand-crafted algorithms for number crunching and thinks like
that.
Is this understanding correct?
Hm, a language like LabVIEW would be perfectly suited for this architecture.
Sadly, The cost for porting the compiler, as well as the cost of the platform
itself make this prohibitive.