howto: port x86 .asm to ia64

  • Thread starter Thread starter Anatoly Greenblatt
  • Start date Start date
A

Anatoly Greenblatt

Hi,

I'm porting a device driver, part of which is written in .asm files.
Apparently ml64 does not support ia64 and IAS (intel assembler) does not
understand my .asm files. Has anyone resolved this problem ?

Thanks,
Anatoly.
 
First are you porting to x64 (the AMD / Intel extensions to x86 for 64-bit)
or to ia64 ( Itanium ). If you are really trying to port to Itanium, then
try to eliminate as much of the assembler as humanly possible, since this
architecture is a mess and a PITA to program.
 
Anatoly,

I know Intel has an assembler for IA64, I think you will have to
investigate that. This really will be a pain, since the instruction set has
parallelism directly in it. Really as much as you can look to moving things
to C code, and using intrinsic's.

I used to be a compiler code generation guy and looked at the
architecture in its early days for a large PC firm, I told them to run away
as fast as they could.


--
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
http://www.windrvr.com
Remove StopSpam from the email to reply
 
Hello Don,
I think I'm not in right direction. I noticed that instruction set for ia64
is different, but when I connect windbg to 64 bit target, I see same good
old instructions set. So what I actually need is to port my .asm files to
x64 and not to ia64?! But ddk files has amd64 and ia64 switches and binaries
in ddk folders are split into amd64 and ia64, so which one of them is x64.

Thanks,
Anatoly.
 
AMD64 is x64, this is a superset of the pentium instruction set and a lot
easier to port to. x64 is also selling well, while IA64 is not selling
enough in the Windows market to be noticed.


--
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
http://www.windrvr.com
Remove StopSpam from the email to reply
 
I know Intel has an assembler for IA64, I think you will have to
investigate that. This really will be a pain, since the instruction set has
parallelism directly in it. Really as much as you can look to moving things
to C code, and using intrinsic's.

I used to be a compiler code generation guy and looked at the
architecture in its early days for a large PC firm, I told them to run away
as fast as they could.

Why?
If parallellism is built into the assembly language itself, wouldn't it be a
huge advantage for writing a compiler that makes use of this inherent
parallelism?
 
Bruno van Dooren said:
Why?
If parallellism is built into the assembly language itself, wouldn't it be
a
huge advantage for writing a compiler that makes use of this inherent
parallelism?

The compiler has to do all the work, this is no small task. Second, every
decent study of parallellism in regular programs indicates that 4 parallel
op's is about the best you can do, so of course Itanic started with 6
parallel operations. It was funny looking at the first Itanium compilers
output, it was almost always 1 real instrcution and 5 NOP's of course in
some cases this turned a 1 byte x86 instructuon into a 32 byte instructon!

For things that are parallel the CPU is ok, but the bottom line is that
after more than 40 years of research, most programs can not be made
parallel. As Dave Kuck an expert in parallellism said: "if you have
infinite parallellism and a program is 50% parallel you can double the speed
of the program".
 
The compiler has to do all the work, this is no small task. Second, every
decent study of parallellism in regular programs indicates that 4 parallel
op's is about the best you can do, so of course Itanic started with 6
parallel operations. It was funny looking at the first Itanium compilers
output, it was almost always 1 real instrcution and 5 NOP's of course in
some cases this turned a 1 byte x86 instructuon into a 32 byte instructon!

For things that are parallel the CPU is ok, but the bottom line is that
after more than 40 years of research, most programs can not be made
parallel. As Dave Kuck an expert in parallellism said: "if you have
infinite parallellism and a program is 50% parallel you can double the speed
of the program".

I think I get what you're saying.
The parallellism in a ia64 is per thread. 1 thread can have several
instructions executing in parallel, but there can be only 1 active thread per
instruction core.

As a result, you can only parallellize simple localized instructions (like
different add instructions to independent variables.

The C / C++ language does not lend itself well to this, because the compiler
has a very hard time figuring out if the ordering of statements is required
or not.
With this in mind, I think the only real use for itanium at this moment is
to run special hand-crafted algorithms for number crunching and thinks like
that.

Is this understanding correct?

Hm, a language like LabVIEW would be perfectly suited for this architecture.
Sadly, The cost for porting the compiler, as well as the cost of the platform
itself make this prohibitive.
 
Bruno,

Yes the hardware is designed to issue 6 instructions at a time, the
compiler lays down all 6 so yes it needs to be a single thread. It is not
only the problem of C/C++ none of the common languages do parallelism well.
But part of this is that people do not do parallelism well, yes we can think
about a few actions at once, and yes for things like manipulating an array
there is inherent parallelism, but we normally think of things as step 1,
step 2, etc. Try taking those steps and arranging them so 6 actions are
always active.


--
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
http://www.windrvr.com
Remove StopSpam from the email to reply
 
Don Burn said:
Yes the hardware is designed to issue 6 instructions at a time, the
compiler lays down all 6 so yes it needs to be a single thread. It is not
only the problem of C/C++ none of the common languages do parallelism well.
But part of this is that people do not do parallelism well, yes we can think
about a few actions at once, and yes for things like manipulating an array
there is inherent parallelism, but we normally think of things as step 1,
step 2, etc. Try taking those steps and arranging them so 6 actions are
always active.

It's a real problem. The Trimedia processor, used in many set-top boxes,
is a VLIW processor that can issue 5 operations per instruction, with a
large register set. They have poured vast amounts of resources into their
compiler over the last 10 years or so, and the typical program averages
about 2.4 ops per instruction.

On the other hand, you can do some very impressive things if you
concentrate on the inner loops. The Trimedia does MPEG like a bat out of
hell, but they haven't been afraid to introduce custom instructions to fit
the need.
 
Back
Top