Is Itanium the first 64-bit casualty?

G

George Macdonald

I artiklen <[email protected]> , George Macdonald


It is strange that Intel put 64 bit in Prescott, but forgot about the
chipset. FWIW, Apples G5 chipset has a GART lookalike for HyperTransport.
They call it DART for DMA Address Relocation Table.

Hmm, yeah... Tumwater?? Is that like emm, bladder fluid?

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
D

Dan Pop

In said:
Yes, it did.

It's a matter of point of view. From the hardware point of view, the
8086 had a linear, 20-bit address space. The segments were a software
artefact, to allow addressing a linear 20-bit address space using
exclusively 16-bit registers.

Dan
 
T

The little lost angel

Now if we could just get some agreement on what "firmware" means.... oops!

It means hardware since firmware means it isn't soft ;PpPpPPp

--
L.Angel: I'm looking for web design work.
If you need basic to med complexity webpages at affordable rates, email me :)
Standard HTML, SHTML, MySQL + PHP or ASP, Javascript.
If you really want, FrontPage & DreamWeaver too.
But keep in mind you pay extra bandwidth for their bloated code
 
S

Stephen Sprunk

Greg Lindahl said:
Pete, this sub-sub-thread is about the fact that 32-bit PCI _can_ have
64-bit addressing. I was not asserting that there is no problem,
I was laughing at absolute statements on comp.arch that happen
to be absolutely wrong, a fairly common issue.

I am guilty of not knowing about DAC, but that doesn't change my answer,
which was based on the actual behavior of a popular x86 OS.

As long as a non-trivial number of PCI cards or bridges don't support DAC,
OSes will have to deal with the case where it's not available. Windows and
Linux both do a very sensible thing when this occurs, though obviously
buying all DAC-capable hw is the best solution.
I *am* an HPC weenie, but that fact has nothing to do with 32-bit PCI with
or without 64-bit addressing.

No, but it means you probably have a limited view of range of hardware
capabilities that a modern OS (and IT dept) has to deal with. Mandating
that all systems have DAC-capable hardware may work in the HPC world, but
the very concept is laughable to an IT manager or OS developer.

S
 
B

Bob Niland

... a number of PCI masters in the field do not support DAC.

Does PCI Express fix this problem by mandating
64-bit compliance?

Sometimes the fix for old I/O problems is just
to junk the old standard. It was my impression
that PCI itself was in part a way to "solve"
the lack of shared-IRQ support on ISA.

I'm sure that PCI-E also fixes the voltage problem
(5v-tolerant 3.3v universal PCI cards are common,
but universal slots are uneconomical, with the result
that 66MHz and faster PCI slots are rare in retail PCs,
even though some of us could use the speed). And, without
having seen the spec, I'll bet PCI-E fixes the clocking
problem too (the max speed of shared slots is limited
to the slowest installed card).
This is a huge problem with EM64T right now.

Since we haven't seen 64-bit benchmarks yet, there
could be "huger" problems. But in any case, not many
EM64T systems will be run in 64-bit mode this year,
and few of those with over 4GB. By year end, Intel
will likely have fixed this oversight (along with
some others they missed when they cloned AMD64).
 
A

Andi Kleen

Does PCI Express fix this problem by mandating
64-bit compliance?

It does, but vendors just ignore it (case in point: popular
GPUs)

I fully expect there will be even devices with support for less than
32bit, like it is common with many "PCI sound chips". Vendors will
just add a PCI-Express bridge, but not fix the core chip.

Since we haven't seen 64-bit benchmarks yet, there
could be "huger" problems. But in any case, not many
EM64T systems will be run in 64-bit mode this year,

What makes you think so? A significant portion
of the AMD Opterons seem to run 64bit kernels, why
should it be different with Intel?
and few of those with over 4GB. By year end, Intel
will likely have fixed this oversight (along with
some others they missed when they cloned AMD64).

It's 3.2+GB, not 4GB, see my other messages in this
this thread.

-Andi
 
B

Bob Niland

The article to which this is a response
never showed up on Google.

AK: > I fully expect there will be even devices with support
for less than 32bit, like it is common with many "PCI
sound chips". Vendors will just add a PCI-Express
bridge, but not fix the core chip.

I'd like to think that the bridges would be fully
compliant, and mask the legacy junk behind them,
but industries do have way of defeating the goals
of their own standards initiatives.
What makes you think so? A significant portion
of the AMD Opterons seem to run 64bit kernels, why
should it be different with Intel?

Any one of these could significantly impair EM64T
adoption (in 64-bit mode):
- CPUs late or not available in quantity
- chipset problems that cause further slips
- system price uneconomic (even for 32-bit)
- desired clock speeds have major thermal issues
- CPUs run no faster in 64-bit mode
- incomplete AMD64 cloning delays software
- CPUs actually run slower in 64-bit mode (e.g. IOMMU)

It's been over a week since Nocona intro, and we're
still waiting for useful 64b test reports. I don't know
how many of the above speculations will turn out true,
but I just have a hunch that for end users needing to
run 64-bit this year, AMD64 chips will be more attractive
than the first generation of EM64T chips.
It's 3.2+GB, not 4GB, ...

So a 4GB config would get tagged by the IOMMU lapse?
... see my other messages in this this thread.

Not found on Google in the xpost groups of this header.
I did find some of your DMA remarks in Linux groups though.
 
S

Stefan Monnier

If you find that you can't use any of the range- and type-checked languages,
for whatever reason, then you probably wouldn't be happy with a non-flat
memory space in hardware, either.
Agreed.

If you can use those languages, then the segments that were being
discussed will be completely invisible to you,
Agreed.

other than for the fact that your software might possibly be a little
faster, because said range checking and object relocation will be getting
some hardware assistance.

Here, tho, I have to disagree: I can't think of any type-safe language where
the compiler would be able to make good use of segments. You might be able
to keep most objects in a flat space and then map every array to a segment
(thus benefitting from the segment's bounds check), but it's probably going
to be too much trouble considering the risk that it will suffer pathological
degradation on programs that use a large number of small arrays (because the
hardware would have a hard time managing efficiently thousands of actively
used segments).
As I understand it, the contention was whether or not it was possible or
useful to run C (or C++) on such hardware. I suspect that quite large
chunks of application-level C (and C++) would be perfectly fine, since the
restrictions involved are the same as those needed to avoid most compiler
warning messages.

In theory, yes. In practice, it's very difficult for the compiler to
figure out how to compile the thing (I gather that one of the difficulty is
to figure out whether "foo *bar" is a pointer to an object `foo' or
a pointer to an array of `foo's or a pointer to one of the elements of an
array of `foo's).


Stefan
 
G

Greg Lindahl

Bob Niland said:
- CPUs run no faster in 64-bit mode

Given the additional registers and better calling sequence, there's
significant additional performance to be had. The IOMMU problem
doesn't affect apps that don't do very much I/O.
It's been over a week since Nocona intro, and we're
still waiting for useful 64b test reports.

Given the short timeframes and teething problems for the hardware
(does anyone have a PCI Express graphics card they can lend me?),
I'm not surprised at all...

Followups to a group that I read.

-- greg
 
S

Stephen Sprunk

Bob Niland said:
It's been over a week since Nocona intro, and we're
still waiting for useful 64b test reports. I don't know
how many of the above speculations will turn out true,
but I just have a hunch that for end users needing to
run 64-bit this year, AMD64 chips will be more attractive
than the first generation of EM64T chips.

The public trial of XP64 doesn't currently run on Intel's chips (though the
closed beta program's current build does -- and reporting performance is
banned):
http://www.infoworld.com/article/04/07/06/HNwindowsnocona_1.html

Very few AMD64 benchmarks have been run on Linux, despite that being the
majority of 64-bit x86 software currently in use. The XP64 trial version is
uniformly slower than XP32 in the few benchmarks that have been run (usually
by gaming sites), so there's not much reason to expect it to be adopted
before the final release in Q4, even among AMD owners.

S
 
G

Greg Lindahl

Stephen Sprunk said:
Very few AMD64 benchmarks have been run on Linux, despite that being the
majority of 64-bit x86 software currently in use.

You might want to be more specific as to what benchmarks you're
referring to, as I know of a lot of HPC benchmarks that have been run
on AMD64 Linux.

Examples: all Linux: http://www.pc.ibm.com/ww/eserver/opteron/benchmarks/,
mixed Solaris and Linux: http://www.sun.com/servers/entry/v20z/benchmarks.html

AMD and IBM make regular SPEC submissions on 64-bit Linux.

Followups to group that I actually read.

-- greg
 
K

Kai Harrekilde-Petersen

Does PCI Express fix this problem by mandating
64-bit compliance?

"Legacy Endpoints" (which are basically PCI v2.3 compliant devices) are
not required to be able to generate addresses above 4GB, according to
the PCI-E spec.

PCI Express Endpoints _are_ required to support >4GB addressing.

(PCI-Express base standard v1.0a, sections 1.3.2.1 & 1.3.2.2, page 32).
I'm sure that PCI-E also fixes the voltage problem
(5v-tolerant 3.3v universal PCI cards are common,
but universal slots are uneconomical, with the result
that 66MHz and faster PCI slots are rare in retail PCs,
even though some of us could use the speed). And, without
having seen the spec, I'll bet PCI-E fixes the clocking
problem too (the max speed of shared slots is limited
to the slowest installed card).

PCI-Express does not use a shared bus; it uses point-to-point links
and a central switch. So there you go.

Regards,


Kai
 
R

Rupert Pigott

Stefan Monnier wrote:

[SNIP]
Here, tho, I have to disagree: I can't think of any type-safe language where
the compiler would be able to make good use of segments. You might be able
to keep most objects in a flat space and then map every array to a segment
(thus benefitting from the segment's bounds check), but it's probably going
to be too much trouble considering the risk that it will suffer pathological
degradation on programs that use a large number of small arrays (because the
hardware would have a hard time managing efficiently thousands of actively
used segments).

Perhaps no more than the risk posed by offloading the problem onto the
TLB/VM code. You have even less control over that as it's basically at
the mercy of the workload at run-time. :(

Cheers,
Rupert
 
A

Andrew Reilly

Stefan said:
Here, tho, I have to disagree: I can't think of any type-safe language where
the compiler would be able to make good use of segments. You might be able
to keep most objects in a flat space and then map every array to a segment
(thus benefitting from the segment's bounds check), but it's probably going
to be too much trouble considering the risk that it will suffer pathological
degradation on programs that use a large number of small arrays (because the
hardware would have a hard time managing efficiently thousands of actively
used segments).

I admit that the possibility of pathological behaviour exists, but
it does on every platform in one way or another. Who would have
thought that database code could have such long runs of loopless
code that it trashed the decoded instruction caches of some
processors, putting the decoder on the critical path?

To sort-of answer the question, I know of at least one
language/compiler combination (Inria's SmartEiffel) that manages
all allocations of small objects through typed pools, so that
system memory requests are always at least a whole page. This is
for a language with strict bounds checking, so I assume that some
of the same issues must hold. I dare say that other strongly
typed languages could do the same. It wouldn't be hard to do
something similar for C, either, just that the only "type"
information available to the allocator at run time is the object size.
In theory, yes. In practice, it's very difficult for the compiler to
figure out how to compile the thing (I gather that one of the difficulty is
to figure out whether "foo *bar" is a pointer to an object `foo' or
a pointer to an array of `foo's or a pointer to one of the elements of an
array of `foo's).

I think that the last of these is the only one that could be
tricky, and without too much thought that seems to fit the plan
too. There is no difference between a pointer to an object foo
and a pointer to an array of foos, just the first case has an
array length of one (which could be checked). If your pointers
are compound things containing base and index, then the pointer to
a specific element should still work too.

Cheers,
 
E

Evgenij Barsukov

Stephen said:
Very few AMD64 benchmarks have been run on Linux, despite that being the
majority of 64-bit x86 software currently in use. The XP64 trial version is
uniformly slower than XP32 in the few benchmarks that have been run (usually
by gaming sites), so there's not much reason to expect it to be adopted
before the final release in Q4, even among AMD owners.

Either they improved the beta-version, or there are some prgrams that can benefit
already (AMD reports 57% improvement with Win64 beta):
http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~87018,00.html

Regards,
Evgenij

--

__________________________________________________
*science&fiction*free programs*fine art*phylosophy:
http://sudy_zhenja.tripod.com
----------remove hate_spam to answer--------------
 
S

Stefan Monnier

I admit that the possibility of pathological behaviour exists, but it does
on every platform in one way or another.

But it's yet-another-source of yet-another-pathological behavior.
It had better give some substantial benefits. AFAIk, the only benefit is
array-bounds checking "for free". Whether that's substantial or not
depends on the circumstance: in many cases ABC can be optimized away.
To sort-of answer the question, I know of at least one language/compiler
combination (Inria's SmartEiffel) that manages all allocations of small
objects through typed pools, so that system memory requests are always at
least a whole page.

Sure, that's pretty common, but it has nothing to do with segments.
Allocating non-array objects in segments (grouped or not) is useless since
the bounds-checking is unnecessary: you might as well allocate it in
a flat space and save the cost of managing segment descriptors.
I think that the last of these is the only one that could be tricky, and
without too much thought that seems to fit the plan too. There is no
difference between a pointer to an object foo and a pointer to an array of
foos, just the first case has an array length of one (which could be
checked).

Most type-safe implementations of arrays need to keep the array size
somewhere at run time, so a single element and an array of size 1 are not
represented the same way.
If your pointers are compound things containing base and index,
then the pointer to a specific element should still work too.

But such a representation of pointers is unnecessarily costly for the usual
case of a pointer to a single object. Some C compilers use such tricks to
get a "safe" implementation, but the runtime cost is very significant
(we're talking more than a factor 2 slowdown).


Stefan
 
S

Stefan Monnier

Here, tho, I have to disagree: I can't think of any type-safe language where
Perhaps no more than the risk posed by offloading the problem onto the
TLB/VM code. You have even less control over that as it's basically at
the mercy of the workload at run-time. :(

Maybe. I guess it could be pretty comparable (but I don't believe in the
"more control" because the user code would only control segment "pointers"
while the base-address, size and access rights would most likely not be
loaded/unloaded explicitly).

But would segments save you from using paging, really?


Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top