Is Itanium the first 64-bit casualty?

Tony Hill · Jul 4, 2004

By both measures, a Mac is a workstation and not a desktop.

Yes, but there's a third important point: A workstation must look
tough and manly...

Macs, on the other hand, are most popular in Bonderberry Blue... As
someone in this newsgroup once mentioned: "fruity chips need not
apply" :>

Tony Hill · Jul 4, 2004

It isn't hurting them at all. Development costs $$$... In today's world,
you don't develop unless the $$$ is there. The $$$ isn't there until Intel
is OEM'ing large quantities of 64-bit hardware. Linux isn't gaining
anything at all from this.

Linux probably isn't gaining much marketshare (and hence, not gaining
much money), but it is gaining some mind-share from this. 64-bit
Linux has been here, stable and working for a full year now, while
Windows is still off in the distance. This sort of thing goes a LONG
way to demonstrating what people have said all along about Linux and
open-source, it gets developed and stable much faster than
closed-source Windows.

Development may cost money, but developing a product for 3 years
without selling it costs more money than developing it for 2 years
without selling it. The simple fact of the matter is that the longer
64-bit x86 Windows spends in development the less time it will spend
actually being sold to customers.

George Macdonald · Jul 4, 2004

And what, pray tell, is "doing it right"?

Uhh that would be not doing device transfers in a sub-optimal way.

The hack already exists for 16-bit ISA DMA, and there's no obvious way to
remove it or it would have been done by now. Likewise, there's no obvious
way to remove bounce bufferring for 32-bit PCI DMA. The Linux folks may be
religious wackos, but they're not dumb.

ISA DMA is a different mechanism from PCI Bus Mastering - if you need
bounce bufferng with both, commonality of code is not obvious. As for
32-bit PCI Bus Mastering, it depends on devices, PC memory mappings and
legacy support - the code may have to be there to cover that but it's
certainly possible to do without bounce buffering for current modern
hardware to <4GB main memory. Doing it all the time is err, sub-optimal;
designing new hardware which needs it is an abberation, maybe even a case
of incompetence or contempt!

If you have a better solution then, as they say, "send patches."

This has nothing to do specifically with Linux but I can't believe the
capability is not already there.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

Andrew Reilly · Jul 4, 2004

Tony said:
At this time I'm pretty certain that there were at least a few
applications that would have run faster on the Alpha and FX!32 than
the 200MHz PPro.

One advantage that FX!32 had over other "fragile" IA32 translators
in use today is that its translations were persistent, and not
limited in size. Both P4's trace cache and Transmeta's Crusoe are
said to have a weakness whenever code size makes the (single)
decoder the bottleneck. FX!32 should only have had that
limitation the first time it saw a new program.

On the other hand, FX!32 was first introduced on EV4 Alphas, I
think, and they had their own reasons for being somewhat brittle,
compared to their OOO successors.

Cheers,

Stephen Sprunk · Jul 4, 2004

George Macdonald said:
ISA DMA is a different mechanism from PCI Bus Mastering - if you need
bounce bufferng with both, commonality of code is not obvious.

The commonality is obvious. If a driver requests a DMA to/from a buffer
that's outside the address space addressable by DMA, then you use a bounce
buffer. Only the size of the address space is different between the two.

The mechanisms of performing a DMA are very different, but the bounce buffer
code is identical.

As for
32-bit PCI Bus Mastering, it depends on devices, PC memory mappings and
legacy support - the code may have to be there to cover that but it's
certainly possible to do without bounce buffering for current modern
hardware to <4GB main memory. Doing it all the time is err, sub-optimal;
designing new hardware which needs it is an abberation, maybe even a case
of incompetence or contempt!

Neither Windows nor Linux uses bounce buffers if the src/dst buffer is in
the low 4GB. I know Linux allows device drivers to specifically request a
buffer in the low 4GB (or 16MB), but by default they're located higher to
conserve low memory, and buffers may come from sources that are unaware of
their physical address (like applications); I presume Windows has the same
mechanisms.

S

S

George Macdonald · Jul 5, 2004

The commonality is obvious. If a driver requests a DMA to/from a buffer
that's outside the address space addressable by DMA, then you use a bounce
buffer. Only the size of the address space is different between the two.

The mechanisms of performing a DMA are very different, but the bounce buffer
code is identical.

Floppy vs. HDD?? When the device accesses are so different, there's no
reason to attempt common code - one will evolve the other stays the same.

Neither Windows nor Linux uses bounce buffers if the src/dst buffer is in
the low 4GB. I know Linux allows device drivers to specifically request a
buffer in the low 4GB (or 16MB), but by default they're located higher to
conserve low memory, and buffers may come from sources that are unaware of
their physical address (like applications); I presume Windows has the same
mechanisms.

Well, well, we seem to be in danger of agreeing... apart from the fact that
the reason to not bounce is more to do with efficiencies, i.e. not having
to do mem-mem copies. An app., of course doesn't have to know about buffer
physical addresses anyway.

Now if we have one mfr who does it right and the other who does it with
err, legacy chipset functionality, which seems preferable?

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

J Ahlstrom · Jul 5, 2004

Scott Moore wrote:

--snip snip

Having segmentation return would be to me like seeing the Third
Reich make a comeback. Segmentation was a horrible, destructive
design atrocity that was inflicted on x86 users because it locked
x86 users into the architecture.

All I can do is hope the next generation does not ignore the
past to the point where the nightmare of segmentation does not
happen again.

Never again !

8086/88 did NOT have SEGMENTS ! ! ! They could
address up to 64K bytes from base addresses
that could be on any 16 byte boundary. That's
all, no length, no attributes, nothing. Some
marketdroid (or Bruce Ravenal?) decided to name
them segments and the well has been poisoned ever since.

B5000/5500/Multics/B6700 et seq had segments
and used them rationally and well.

80286 et seq had segments that could be optionally
used but which no one used.

Are you railing against 8086 so-called "segments" or
real (Burroughs/Multics) segments?

JKA

J Ahlstrom · Jul 5, 2004

Rupert said:
I think it's old enough and dead enough and has had enough money thrown
at it to be called legacy though.

Cheers,
Rupert

A legacy is what let's you buy a boat or send your
kids to college. In our whole field "legacy" should
be replaced by "millstone". Let's start with IA64 a
millstone architecture.

JKA

J Ahlstrom · Jul 5, 2004

Rupert said:
I think it's old enough and dead enough and has had enough money thrown
at it to be called legacy though.

Cheers,
Rupert

A legacy is what let's you buy a boat or send your
kids to college. In our whole field "legacy" should
be replaced by "millstone". Let's start with IA64 a
millstone architecture.

JKA

Niels J=?ISO-8859-1?B?+A==?=rgen Kruse · Jul 5, 2004

I artiklen <[email protected]> , George Macdonald

Now if we have one mfr who does it right and the other who does it with
err, legacy chipset functionality, which seems preferable?

It is strange that Intel put 64 bit in Prescott, but forgot about the
chipset. FWIW, Apples G5 chipset has a GART lookalike for HyperTransport.
They call it DART for DMA Address Relocation Table.

Ken Hagan · Jul 5, 2004

Scott Moore said:
I have heard the arguments over and over and over (and over) again.

Obviously you didn't live through the bad old days of segmentation,
or you would not be avocating it.

Nick said:
I like it! Please collect your wooden spoon as you go out.

Nick: He's not going to understand. He doesn't even know about google.

Scott: At least on my newsreader, Nick's posts contain the header

Organisation: University of Cambridge.

Google for "Nick Maclaren University of Cambridge" to decide whether
Nick is too young to know about segmentation.

Google for "University of Cambridge wooden spoon" to learn about spoons.

Dan Pop · Jul 5, 2004

In said:
You mean memory map whole files? There isn't any reason why a big file can't
be mmapped into a 32-bit address space as a window. Its a sane way too.
Otherwise you can't mmap a file that is bigger that the virtual memory of
the system.

That's why people want 64-bit pointers even if they don't plan to install
more than 4 GB of memory on their systems.

Most 64-bit addressing CPUs don't yet have full 64-bit virtual
address translation.

They have as much as needed to cover the current demands. No current
system with a 64-bit CPU can realistically store a file of 2 ** 63 bytes.
So, no need to mmap such a file. For the time being...

Dan

Joe Seigh · Jul 5, 2004

Dan said:
They have as much as needed to cover the current demands. No current
system with a 64-bit CPU can realistically store a file of 2 ** 63 bytes.
So, no need to mmap such a file. For the time being...

They can if they use sparse files (a simple form of compression). There
are some crude ad hoc database implementations that hash index into
sparse files. It's fun backing up those kind of databases as raw files.
Takes a long time. You could mmap it and scan for non zero data. That
would give your virtual memory subsystem a good run.

Joe SEigh

Andi Kleen · Jul 5, 2004

That's why people want 64-bit pointers even if they don't plan to install
more than 4 GB of memory on their systems.

In practice it's more than 2GB.

On a 32bit OS the kernel normally needs some virtual space (1-2GB
depending on the OS) that cannot be used by applications and there is
space lost to shared libraries etc. Due to fragmentation and other
waste virtual memory is less efficiently used than physical memory. If
you need continuous virtual space and you're slightly unlucky the cut
off point can be even at slightly over ~1.5GB. This can be increased
by special tuning, but this generally needs some effort and has some
disadvantages.

-Andi

Nick Maclaren · Jul 5, 2004

|> >
|> > They have as much as needed to cover the current demands. No current
|> > system with a 64-bit CPU can realistically store a file of 2 ** 63 bytes.
|> > So, no need to mmap such a file. For the time being...
|>
|> They can if they use sparse files (a simple form of compression). There
|> are some crude ad hoc database implementations that hash index into
|> sparse files. It's fun backing up those kind of databases as raw files.
|> Takes a long time. You could mmap it and scan for non zero data. That
|> would give your virtual memory subsystem a good run.

Heaven help me, yes. At least two compilers I am inflicted with
use memory mapping and sparse access; in one case, it means that
I can't set the store limits low enough to protect the system
against a user making a mistake and running us out of swap in a
single process!

It is ridiculous that one of the first things that some vendors'
development teams have used 64-bit addressing for is to make their
products unmanageable by their employer's own operating system.
But, as Schiller said:

Against stupidity, the Gods themselves contend in vain

A related insanity is a perverse program that spawns its processes
using MPI, and then uses a massive shared memory segment for data
transfer. If I make the limits large enough to let that one
through, and a user makes a mistake in his space calculations,
by, bye system.

Regards,
Nick Maclaren.

Nick Maclaren · Jul 5, 2004

|>
|> Google for "Nick Maclaren University of Cambridge" to decide whether
|> Nick is too young to know about segmentation.

Gosh! Thanks. I had completely forgotten that debate. I am
getting old ....

Regards,
Nick Maclaren.

Robin KAY · Jul 5, 2004

Andi said:
In practice it's more than 2GB.

[snip]

For sun4u systems at least, the Solaris kernel uses only 4mb at the top
of the process address space. All non-immediate SPARC load/store
instructions have an 8-bit field for an address space identifier.
Solaris/IA32, on the other hand, allows processes to use up to 3gb of
the virtual address space. I think more in some configurations.

Stephen Sprunk · Jul 5, 2004

Robin KAY said:
Andi said:

In practice it's more than 2GB.

Click to expand...

[snip]

For sun4u systems at least, the Solaris kernel uses only 4mb at the top
of the process address space. All non-immediate SPARC load/store
instructions have an 8-bit field for an address space identifier.
Solaris/IA32, on the other hand, allows processes to use up to 3gb of
the virtual address space. I think more in some configurations.

RedHat offers a special kernel that allows each process just shy of 4GB, and
the kernel gets its own 4GB. Syscalls require a page table swap, which
makes them slower, but apparently some folks consider having double the
address space per process to be worth it.

Hopefully all those people will be switching to amd64 soon and such tricks
can be forgotten.

S

daytripper · Jul 5, 2004

Scott Moore wrote:

--snip snip

8086/88 did NOT have SEGMENTS ! ! !

Yes, it did.

And I'm not yet quite old enough to forget setting CS, DS, SS and ES segment
registers in my yute...

/daytripper (who is sad that the 8086 is already Misremembered History ;-)

Yousuf Khan · Jul 5, 2004

Peter Dickerson said:
While the 8086 and 8088 had things called segment registers and the
corresponding regions of memory were called segments they do no map
to what is commonly thought of as segments today. 8086 segments could
not be moved in memory (the value in the 'segment' register directly
determined the address range) nor could the segment be not-present,
write-protected or changed in size. Further some values of the
segment registers resulted in wrapping of the physical address back
to 0. So, while 8086 had things called segment registers and segments
they don't really qualify in today's use of the word.

Whatever they are considered today, back then they were considered segments.
End of story.

Yousuf Khan

Is Itanium the first 64-bit casualty?

Tony Hill

Tony Hill

George Macdonald

Andrew Reilly

Stephen Sprunk

George Macdonald

J Ahlstrom

J Ahlstrom

J Ahlstrom

Niels J=?ISO-8859-1?B?+A==?=rgen Kruse

Ken Hagan

Dan Pop

Joe Seigh

Andi Kleen

Nick Maclaren

Nick Maclaren

Robin KAY

Stephen Sprunk

daytripper

Yousuf Khan