Is Itanium the first 64-bit casualty?

J Ahlstrom · Jul 5, 2004

daytripper said:
Yes, it did.

And I'm not yet quite old enough to forget setting CS, DS, SS and ES segment
registers in my yute...

/daytripper (who is sad that the 8086 is already Misremembered History ;-)

Abe Lincoln: How many legs does a dog have?
Friend: 4
Abe: And if you call a tail a leg?
Friend: 5
Abe: 4. Calling a tail a leg doesn't make it one.

Ahlstrom: Calling a base register a segment
register doesn't make it one. Calling an
addressable area of memory a segment doesn't make
it one.

JKA

Peter Dickerson · Jul 5, 2004

daytripper said:
Yes, it did.

And I'm not yet quite old enough to forget setting CS, DS, SS and ES segment
registers in my yute...

/daytripper (who is sad that the 8086 is already Misremembered History ;-)

Click to expand...

While the 8086 and 8088 had things called segment registers and the
corresponding regions of memory were called segments they do no map to what
is commonly thought of as segments today. 8086 segments could not be moved
in memory (the value in the 'segment' register directly determined the
address range) nor could the segment be not-present, write-protected or
changed in size. Further some values of the segment registers resulted in
wrapping of the physical address back to 0. So, while 8086 had things called
segment registers and segments they don't really qualify in today's use of
the word.

Peter

Casper H.S. Dik · Jul 5, 2004

Andi Kleen said:
(e-mail address removed) (Dan Pop) writes:

In practice it's more than 2GB.

On a 32bit OS the kernel normally needs some virtual space (1-2GB
depending on the OS) that cannot be used by applications and there is
space lost to shared libraries etc. Due to fragmentation and other
waste virtual memory is less efficiently used than physical memory. If
you need continuous virtual space and you're slightly unlucky the cut
off point can be even at slightly over ~1.5GB. This can be increased
by special tuning, but this generally needs some effort and has some
disadvantages.

You could run just a 64 bit kernel (single 64 bit program) and then
you will have 4GB of user virtual memory.

(E.g., 32 bit binaries on Solaris/UltraSPARC have 3.996 GB of virtual
memory available to them)

Casper

Sander Vesik · Jul 5, 2004

In comp.arch Joe Seigh said:
They can if they use sparse files (a simple form of compression). There
are some crude ad hoc database implementations that hash index into
sparse files. It's fun backing up those kind of databases as raw files.
Takes a long time. You could mmap it and scan for non zero data. That
would give your virtual memory subsystem a good run.

If your backup solution doesn't support sparse files throw it away and go
back to tar. Its bound to be more reliable anyways.

J Ahlstrom · Jul 5, 2004

Yousuf said:
Whatever they are considered today, back then they were considered segments.
End of story.

Yousuf Khan

We are clearly having a failure to miscommunicate.

There are two concepts here that some people are
using the same single word to refer to "segments".
There are segments like Multics had and the
so-called segments of the 8086. We cannot talk
usefully about either if we do not distinguish
which we mean when we are talking about them. It's
like talking about payload and range and cost and
speed of vehicles without saying if we are talking
about VWs or 16-wheeler trucks. To say that they
are all vehicles "end of story" doesn't help the
understanding of the issues.

JKA

Andrew Reilly · Jul 5, 2004

Peter said:
While the 8086 and 8088 had things called segment registers and the
corresponding regions of memory were called segments they do no map to what
is commonly thought of as segments today.

Or even in its own day. The memory managed version of the Z8000
and the add-on memory manager for the 68010 both had full-strength
segments, with ownerships, permissions, relocations and lengths.
Probably why it took longer to get them to market.

Mike Smith · Jul 5, 2004

Peter said:
While the 8086 and 8088 had things called segment registers and the
corresponding regions of memory were called segments they do no map to what
is commonly thought of as segments today. 8086 segments could not be moved
in memory (the value in the 'segment' register directly determined the
address range) nor could the segment be not-present, write-protected or
changed in size. Further some values of the segment registers resulted in
wrapping of the physical address back to 0.

That's right - they were *real-mode* segments, rather than
*protected-mode* segments. That doesn't mean they weren't *segments*.

Andrew Reilly · Jul 5, 2004

Mike said:
That's right - they were *real-mode* segments, rather than
*protected-mode* segments. That doesn't mean they weren't *segments*.

More like windows, but that's irrelevant: they are what they are.

What they are is irrelevant to a discussion of the merits or
otherwise of hardware-managed segments as a software protection
and object-holding mechanism, because they offered neither of
those abilities.

If you find that you can't use any of the range- and type-checked
languages, for whatever reason, then you probably wouldn't be
happy with a non-flat memory space in hardware, either. If you
can use those languages, then the segments that were being
discussed will be completely invisible to you, other than for the
fact that your software might possibly be a little faster, because
said range checking and object relocation will be getting some
hardware assistance.

As I understand it, the contention was whether or not it was
possible or useful to run C (or C++) on such hardware. I suspect
that quite large chunks of application-level C (and C++) would be
perfectly fine, since the restrictions involved are the same as
those needed to avoid most compiler warning messages. I believe
that it is possible to write to a subset of C++ that will compile
and run in "managed" mode in the .NET framework. Similarly, some
(all?) of the GNU Classpath Java library implementation is in C++,
using the subset that fits the Java object model.

K Williams · Jul 5, 2004

J said:
Abe Lincoln: How many legs does a dog have?
Friend: 4
Abe: And if you call a tail a leg?
Friend: 5
Abe: 4. Calling a tail a leg doesn't make it one.

Ahlstrom: Calling a base register a segment
register doesn't make it one. Calling an
addressable area of memory a segment doesn't make
it one.

Many moons ago (mid '80s) we were given a presentation on the '286 by a
crew of Intel Architects. Their position, at the time, was that the
8086 was the piece of the '286 architecture that could have been
implemented at the time. The '286 wasn't an extension of the '86, per
se, rather the '86 was a subset of the '286 architecture. ...thus
they're called "segments" and "segment registers".

Andi Kleen · Jul 6, 2004

Casper H.S. Dik said:
You could run just a 64 bit kernel (single 64 bit program) and then
you will have 4GB of user virtual memory.

Sure that's a fine solution, but it requires a 64bit CPU
(contrary to all the "we don't need 64bit yet" nay sayers)

My point was just that the usual "we only need 64bit when
everybody has more than 4GB of RAM" argument you often hear
is bogus when you look at the details.

It's also possible on some architectures to use separate page tables
or segments for the kernel and for user space, but at least on x86
without ASNs that can hold multiple page it's extremly slow (basically
every interrupt and syscall turns into a full heavy weight context
switch). Also the kernel cannot directly access user space memory
anymore, so it has to implement its own page table walks and TLBs
for this. These tend to be a lot slower than what the CPU native
facilities offer.

There is another issue - even when you look at physical
memory, not virtual the real boundary is more like 3.2-3.5GB.

The reason is that a machine without real IOMMU (like a PC or many
other chipsets) need to put some memory holes below the 4GB range for
the AGP aperture and for 32bit only PCI IO mappings. This leads to
that a PC with even less 4GB of real memory has physical memory
addresses beyond 0xffffffff. The firmware/chipset has to map the
memory "around" the memory holes, otherwise you would lose memory.

Usually they can only do that at DIMM granuality. There are machines
that when you have two 2GB DIMMs put the first one at 0 and the second
one at 4GB (and 2GB-4GB is PCI mappings and memory hole). The highest
physical address seen is 6GB, even though you only have 4GB of
memory. The same applies for a 4x1GB setup, there the highest
physical address is typically at 5GB. In some cases it even
happens at less than 4GB of memory, e.g. with 3.5GB when the holes
are bigger than 0.5GB.

There are some PCI cards with extremly big IO mappings (e.g. a lot of
the high speed HPC interconnects who do direct memory to memory
network) where the threshold is even lower. While these extreme cards
support usually 64bit BARs and could put their mappings beyond the 4GB
boundary the PC BIOS cannot do that for compatibility reasons
and it's difficult to do later in a driver because placing
IO mappings is deeply chipset specific (that's more an x86 specific
issue admittedly, but then this thread is cross posted to x86 specific
groups)

While an 32bit x86 OS can access that high memory using PAE it already
becomes inconvenient and slow (requires unnecessary TLB flushes and waits in
the kernel to manage limited mapping space) and a 64bit OS is better at
this.

-Andi

Rob Warnock · Jul 6, 2004

+---------------
| >Most 64-bit addressing CPUs don't yet have full 64-bit virtual
| >address translation.
|
| They have as much as needed to cover the current demands. No current
| system with a 64-bit CPU can realistically store a file of 2 ** 63 bytes.
| So, no need to mmap such a file. For the time being...
+---------------

Uh... I beg your pardon... SGI's XFS filesystem allows single files
as large as 9.0e+18 bytes, which requires at least 63 bits to address:

<URL:http://www.sgi.com/software/xfs/>
<URL:http://www.sgi.com/pdfs/2668.pdf>

-Rob

Robin KAY · Jul 6, 2004

Rob said:
Uh... I beg your pardon... SGI's XFS filesystem allows single files
as large as 9.0e+18 bytes, which requires at least 63 bits to address:

Unless it's very sparse, where do the 8185452tb of storage come from?

Joe Pfeiffer · Jul 6, 2004

daytripper said:
Yes, it did.

And I'm not yet quite old enough to forget setting CS, DS, SS and ES segment
registers in my yute...

It did not have segments according to *any* normal definition of
segments (though it did have CS, DS, SS, and ES registers as you
remember). reg*16 + address is not segmentation. Real segments were
introduced with the 286.

Pete Zaitcev (OTID1) · Jul 6, 2004

Hush! He was sounding very convincing, let's not burst his bubble.

Greg, a number of PCI masters in the field do not support DAC.
This is a huge problem with EM64T right now. Now it might not
be such a big problem for an HPC weenie who only has to deal
with a very limited set of hardware (most of which is high end
anyway). But our own comp.arch Andi Kleen is told to push
GFP_HIGHDMA right about this time. There is much grumbling
about problems with it, mostly related to the zone balancing.
Surely you're aware?

I do not have an idea how Windows handles it. Perhaps Linux
simply makes this problem bigger than needed with zoning.
One might imagine a zoneless Virtual Memory which moves a low
page away (to swap or elsewhere) at the moment of the allocation
request, with DMA mask being a parameter. Such a system would not
suffer from a balancing problem (although it continues to suffer
from a lowmem pressure, so buy those DAC cards :-)

In any case, EM64T platforms are deficient in this respect
and your derision is misplaced.

-- Pete

Pete Zaitcev (OTID1) · Jul 6, 2004

Alexander Grigoriev said:
Alexander Grigoriev said:

In Windows, you can do different ways:

1. A driver can allocate a contiguous non-pageable buffer and do DMA
transfers to/from it. The buffer can be requested to be in lower 4 GB,
or even in lower 16 MB, if you need to deal with legacy DMA. 2. A buffer
may be originated from an application. [...]

Click to expand...

Linux has the same two cases. I think the bounce buffers are handled in
generic code but the drivers have to explicitly support them; [...]

It's an oversimplification. Block drivers do not have to worry
about any such issues. OK, it's a simplification again. They
have to notify the block layer about their DMA mask size.
This is a little tricky in case of stacked drivers (RAID).
Most other drivers have keep the reach of their hardware
in mind, and some do have to make bounce buffers themselves.
One area where Linux is weaker is its inability to allocate
more than two contiguous pages. So, the case 1 gets hideously
difficult, in fact I haven't seen a single driver doing it right.

I still note that the same mess is present with 24-bit ISA DMA, and
there's even cases where some DMA buffers have to live in the first 1MB
for obscure backwards-compatibility reasons. These hacks will remain in
amd64, so it's not much more of a mess to do the same for 32-bit PCI.

The AMD AMD64 does have IOMMU (or at least something which works
like one). Intel AMD64 does not have IOMMU, this is where all
the difficulty originates.

-- Pete

Greg Lindahl · Jul 6, 2004

Greg, a number of PCI masters in the field do not support DAC.
This is a huge problem with EM64T right now. Now it might not
be such a big problem for an HPC weenie who only has to deal
with a very limited set of hardware (most of which is high end
anyway).

Pete, this sub-sub-thread is about the fact that 32-bit PCI _can_ have
64-bit addressing. I was not asserting that there is no problem, I was
laughing at absolute statements on comp.arch that happen to be
absolutely wrong, a fairly common issue. I *am* an HPC weenie, but
that fact has nothing to do with 32-bit PCI with or without 64-bit
addressing.

In fact, I lived through the first implementation of this stuff on
Linux for Sparc64 & Alpha, and I was damn glad that almost all of
my cluster machines didn't have that much real memory.

To sum up: don't jump so fast or far. Sometimes a little joke is a
little joke, and not a claim that an entire thread is irrelevant.

-- greg

Nick Maclaren · Jul 6, 2004

|> > >
|> > >8086/88 did NOT have SEGMENTS ! ! !
|> >
|> > Yes, it did.
|> >
|> > And I'm not yet quite old enough to forget setting CS, DS, SS and ES segment
|> > registers in my yute...
|>
|> It did not have segments according to *any* normal definition of
|> segments (though it did have CS, DS, SS, and ES registers as you
|> remember). reg*16 + address is not segmentation. Real segments were
|> introduced with the 286.

Hmm. I remember when pages were sometimes called segments, and
there were LOTS of other 'normal' definitions, too.

Regards,
Nick Maclaren.

=?ISO-8859-1?Q?Jan_Vorbr=FCggen?= · Jul 6, 2004

But, as Schiller said:

Against stupidity, the Gods themselves contend in vain

One of my favourite quotes. Asimov liked it so much he turned it into
the title of a novel.

Jan

Alexis Cousein · Jul 6, 2004

Yousuf said:
Whatever they are considered today, back then they were considered segments.
End of story.

They were named segments because of a vague resemblance to other things called
segments at the time, but it was already a gross misnomer back then.

Just because Intel named them segments doesn't mean they were to be *considered*
as segments.

George Macdonald · Jul 6, 2004

We are clearly having a failure to miscommunicate.

There are two concepts here that some people are
using the same single word to refer to "segments".
There are segments like Multics had and the
so-called segments of the 8086. We cannot talk
usefully about either if we do not distinguish
which we mean when we are talking about them. It's
like talking about payload and range and cost and
speed of vehicles without saying if we are talking
about VWs or 16-wheeler trucks. To say that they
are all vehicles "end of story" doesn't help the
understanding of the issues.

Trouble is if you look back in computing history you'll find the same words
being used for different things all over the place. I dunno about the
Burroughs thing but Data General had "segments" in their MV Series which
defined "rings" of protection within the 32-bit address space of a process.

CDC 6600 had an instruction "stack" which had nothing to do with what we
now call a stack.

Then again, we had M$ come along and use "process", "task" and "thread" for
things which were used by predecessors to mean something different. M$,
when they employed people who knew better, even changed things so that what
used to be a task was now a process and poor task was orphaned but there
are still diehards who refuse to give it up. :-)

Now if we could just get some agreement on what "firmware" means.... oops!

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??

Is Itanium the first 64-bit casualty?

J Ahlstrom

Peter Dickerson

Casper H.S. Dik

Sander Vesik

J Ahlstrom

Andrew Reilly

Mike Smith

Andrew Reilly

K Williams

Andi Kleen

Rob Warnock

Robin KAY

Joe Pfeiffer

Pete Zaitcev (OTID1)

Pete Zaitcev (OTID1)

Greg Lindahl

Nick Maclaren

=?ISO-8859-1?Q?Jan_Vorbr=FCggen?=

Alexis Cousein

George Macdonald