Nvidia Said to Take On Intel in Tablet Computer Chips

Yousuf Khan · Aug 15, 2010

Nvidia Said to Take On Intel in Tablet Computer Chips - Bloomberg
"Trying to replicate the performance of Intel chips using software -- an
approach Transmeta tried about a decade ago -- hasn’t worked before
because it’s hard to deliver enough performance to run computer programs
like Microsoft Corp.’s Windows fast enough, according to In-Stat’s
McGregor. Intel’s X86 technology has taken over the PC and server
industries, displacing companies such as Motorola Inc., whose chips once
ran Apple computers. "
http://www.bloomberg.com/news/2010-...-challenge-on-intel-in-tablet-processors.html

daytripper · Aug 16, 2010

[...]"Intel’s X86 technology has taken over the PC and server
industries, displacing companies such as Motorola Inc., whose chips once
ran Apple computers. "

Wow. "Breaking news", eh?

Did somebody set the Way Back Machine to the 1990s - when "Motorola" and
"computer" were still used in the same sentence?

Sheesh...

obcsiphc: Good luck to emulators everywhere. As always, they're gonna need
it...

/daytripper

Yousuf Khan · Aug 16, 2010

[...]"Intel’s X86 technology has taken over the PC and server
industries, displacing companies such as Motorola Inc., whose chips once
ran Apple computers. "

Click to expand...

Wow. "Breaking news", eh?

Did somebody set the Way Back Machine to the 1990s - when "Motorola" and
"computer" were still used in the same sentence?

Sheesh...

obcsiphc: Good luck to emulators everywhere. As always, they're gonna need
it...

/daytripper

They really need to take over VIA, and get their core. The core is modern.

Yousuf Khan

Robert Myers · Aug 16, 2010

On Aug 15 said:
obcsiphc: Good luck to emulators everywhere. As always, they're gonna need
it...

As I understand it, x86 now emulates x86. There must be patents
related to instruction decode that make it hard for others to play, as
the ISA and the physical operations of the microprocessor are now
separated by microcode for [almost?] all processors outside the
embedded space. I don't really know of exceptions.

Robert.

Yousuf Khan · Aug 17, 2010

On Aug 15 said:
On Aug 15 said:

obcsiphc: Good luck to emulators everywhere. As always, they're gonna need
it...

Click to expand...

As I understand it, x86 now emulates x86. There must be patents
related to instruction decode that make it hard for others to play, as
the ISA and the physical operations of the microprocessor are now
separated by microcode for [almost?] all processors outside the
embedded space. I don't really know of exceptions.

Robert.

In most modern implementations of x86, certain common instructions are
considered hard-coded, while others are emulated through microcode. Most
floating point instructions are a series of more basic instructions.

Yousuf Khan

Robert Myers · Aug 17, 2010

In most modern implementations of x86, certain common instructions are
considered hard-coded, while others are emulated through microcode. Most
floating point instructions are a series of more basic instructions.

I'll take the word of real computer architects on this one, Yousuf.
Past the decode stage, the ISA doesn't matter. Programmers and others
like to talk about ISA's because that's all they understand. ISA is
irrelevant now. Whatever obstacles there are to "emulating" x86 have
nothing to do with the ISA.

Robert.

Yousuf Khan · Aug 18, 2010

I'll take the word of real computer architects on this one, Yousuf.
Past the decode stage, the ISA doesn't matter. Programmers and others
like to talk about ISA's because that's all they understand. ISA is
irrelevant now. Whatever obstacles there are to "emulating" x86 have
nothing to do with the ISA.

Yeah, who's that?

Yousuf Khan

Robert Myers · Aug 18, 2010

Yeah, who's that?

It's been said in dozens of different ways by architects (formerly of)
both AMD and Intel on comp.arch. It's become almost an obsession with
Mitch Alsup, formerly the chief architect of AMD.

Robert.

daytripper · Aug 18, 2010

On Aug 15 said:
On Aug 15 said:

obcsiphc: Good luck to emulators everywhere. As always, they're gonna need
it...

Click to expand...

As I understand it, x86 now emulates x86. There must be patents
related to instruction decode that make it hard for others to play, as
the ISA and the physical operations of the microprocessor are now
separated by microcode for [almost?] all processors outside the
embedded space. I don't really know of exceptions.

Robert.

Click to expand...

In most modern implementations of x86, certain common instructions are
considered hard-coded, while others are emulated through microcode. Most
floating point instructions are a series of more basic instructions.

Yousuf Khan

ooooh..."emulated through microcode" seems rather pejorative, considering the
obvious alternative design styles might never yield a functional device in
your lifetime ;-)

/daytripper

Robert Myers · Aug 18, 2010

On Aug 18 said:
Here's a more pessimistic take on the Nvidia x86 story:

SemiAccurate ::

That said, lets assume that Tegra 5, code named T50, will be on time.
How is an ARM core related to x86? That is easy, Nvidia is going to use
Transmeta-esque code morphing firmware to make the CPU run x86 code.
Firmware x86 has two associated problems, one technical, one legal.

It would be a mistake to translate Transmeta's troubles into automatic
trouble for anyone else. The idea of putting a scheduling front-end
in front of an in-order core is a loser. Everyone knows that now.
Whatever mistakes there are to be made, they will be different
mistakes.

On the technical side, the problem is simple, speed. ARM A9 CPUs are
great for phone level applications, and can reach into the current
tablet space, but hit a glass ceiling there. If Eagle doubles the
performance per MHz and doubles performance per watt, it will basically
be on par with the low end of the Atom-class CPUs, and woefully behind
the Nano/Bobcat level of performance."http://www.semiaccurate.com/2010/08/17/details-emerge-about-nvidias-x...

A lot of money has been dumped into x86 design. It's hard to see how
IBM stays competitive as a manufacturer of processors and makes money,
but I suspect Power is a loss-leader for them--and cooperation with
AMD essentially makes x86 money available to them for advances in
process technology. One has to admire VIA for even staying on the
field if not really in the game.

The bottom line here is that you need bucketloads of money to play the
x86 game even at an acceptable level. If you don't have a path to the
bottomless well of Wintel money, good luck to you.

Robert.

Sebastian Kaliszewski · Aug 19, 2010

Robert said:
I'll take the word of real computer architects on this one, Yousuf.
Past the decode stage, the ISA doesn't matter.

It doesn't matter except in cases where it does

What's behind decode stage is still strongly tied to ISA. Stuff like
retirement logic, memory ordering, or even such basic stuff like
execution pipeline arangement is dictated by ISA. And the rest is
optimised for a particular ISA (or performance will be lousy). Even
transmeta chip was optimised for x86 ISA emulation -- without that
performance would be even lousier.

What doesn't matter beind decode is layout of various instructions (ie.
where is an opcode, where are registers described, etc.). That's
important stuff as that is the stuff distinguishing RISC from CISC. But
that's not all the stuff -- things like number of registers, memory
ordering requirements, etc *do* matter alot (and is stall part of an ISA).

rgds
\SK

Robert Myers · Aug 19, 2010

On Aug 19, 4:45 am, Sebastian Kaliszewski

It doesn't matter except in cases where it does

The names of the people who have actually been responsible for the
entire design (or enough of it to speak with authority) of a modern,
full-featured processor could probably be written on one sheet of a
yellow legal pad. The rest of us are observers, no matter what
classes you have taken at university.

What's behind decode stage is still strongly tied to ISA. Stuff like
retirement logic, memory ordering, or even such basic stuff like
execution pipeline arangement is dictated by ISA.

Dictated is such a strong word that I suspect this statement to be
false. From the discussions that take place from those who know much,
much more than I do or ever could, I conclude that details of memory
ordering are somewhat arbitrary and aren't even fully specified by the
ISA. As you would say, it doesn't matter until it does, and, when it
does, as in concurrency, I don't think anyone completely understands
what is guaranteed to work and what isn't.

I conclude those things from listening to the architects who do know
clarify what the situation actually is (as opposed to what the
programmers and others who don't necessarily know imagine it to be)
and following any number of long and baffling discussions among
programmers (in which there is never agreement) as to how to safely
handle concurrency. In the end, people seem to do whatever they have
to do and to use whatever works. To say that the ISA dictates in a
situation where so many details seem arbitrary seems not even
plausible to me.

Now, if you are trying to "emulate" an x86, you have to emulate what
processors now in the market actually do, and that is one hell of a
lot of things to keep track of and get right, and *would* include
mimicking the arbitrary details that probably only Intel is powerful
enough to dictate. That is to say, you are stuck duplicating
arbitrary things no matter whether you have an Intel license or not.

What having a real license will do for you is get you access to all
the NDA stuff that Intel tells AMD and Via and won't tell you or
anyone else without specific legal rights to the information. If you
want to say that such things are part of the ISA, then your statement
could be taken as in some sense correct, but those details appear to
me to be more dictated by how a particular manufacturer has chosen to
cope with cache coherency than by the ISA.

I can't even imagine how an ISA would dictate retirement logic. What
a computer must do, no matter if it's System/360 or x86, is to make it
appear that instructions have been executed in order. Life gets very
complicated, I'm sure, when the issue is not what one processor does,
but what two processors do when possibly in competition for the same
memory addresses. However your knock-off behaves, it has to emulate
enough of the right behavior in enough of the important situations
that very little software breaks. In that sense, I'm sure that
daytripper's advice is sound: good luck.

Neither Intel nor IBM, implementing different ISA's, can dictate all
details, regardless of the ISA. Important c code has to work across
across different processor lines. Then, for the dicey stuff, you are
into whether the behavior of c is well-defined, and the answer appears
to be that it isn't. In the end you do whatever you have to do to
get an acceptably low level of weird errors. That seems to mean lots
and lots of testing, as opposed to any kind of formal logic related to
the ISA. You have to do all that testing, no matter what ISA you are
implementing.

All of these details are somewhat of a problem for ARM in general, as
I gather, because the details of processor behavior are less well-
defined, as different manufacturers have made different arbitrary
decisions working with nominally the same ISA. Without even thinking
about emulating x86, there are lots of arbitrary things to get wrong
or to be in conflict about, but they aren't dictated by the ISA.

And the rest is
optimised for a particular ISA (or performance will be lousy). Even
transmeta chip was optimised for x86 ISA emulation -- without that
performance would be even lousier.

What doesn't matter beind decode is layout of various instructions (ie.
where is an opcode, where are registers described, etc.). That's
important stuff as that is the stuff distinguishing RISC from CISC. But
that's not all the stuff -- things like number of registers, memory
ordering requirements, etc *do* matter alot (and is stall part of an ISA)..

The disconnect between what the processor appears to be on the outside
and what it is actually doing is so profound that I don't even know
how to discuss this kind of thing. An architect has to make sure that
the architectural registers in the ISA appear actually to be there.
Behind every architectural register, there are now many, many
invisible registers that are dictated by the internal workings of the
processor and not by the ISA. Once again, even x86 is emulating
itself.

Summary: there are plenty enough details to get wrong that
daytripper's advice seems sound to me, but most of them really aren't
dictated by the ISA but by an installed code base.

Robert.

Robert Myers · Aug 22, 2010

You are wrong.

I give you the example of Apple's AltiVec instruction set.
AltiVec at introduction gave the PowerPC chips a 10x speed advantage
on a bunch of important graphical benchmarks, and makes the vector
processor useful in a wide variety of other tasks that are not
normally thought of as vector code. (Filesystem block allocation, etc.)

Ultimately this one innovation alone was not enough for PowerPC to
overcome all the disadvantages of competing against Intel, but it
did level the playing field for a decade.

AltiVec came from a software firm, those "real computer architects"
idea of innovation was Thumb1 and MIPS16, bunch of (CENSORED).

Sure.

I can bolt a GPU onto the CPU, declare its instructions and features
to be part of the ISA, and claim that ISA, in the sense that people
usually mean it, can make a huge difference. That makes ia32 with
MMX, SSE, etc. a different ISA from the 386. You can change the way
that ISA is used to make your statement true and mine false, but I
decline all arguments about terminology. I know what I meant, even if
you didn't.

You can bolt a specialized capability onto anything, so the ISA, in
the sense that people usually mean it, *doesn't* make a difference, at
least not from the evidence you have presented.

Robert.

nmm1 · Aug 22, 2010

I give you the example of Apple's AltiVec instruction set.
AltiVec at introduction gave the PowerPC chips a 10x speed advantage
on a bunch of important graphical benchmarks, and makes the vector
processor useful in a wide variety of other tasks that are not
normally thought of as vector code. (Filesystem block allocation, etc.)

Marginally. The differences were not exciting, outside benchmarketing
and a few specialised uses.

Ultimately this one innovation alone was not enough for PowerPC to
overcome all the disadvantages of competing against Intel, but it
did level the playing field for a decade.

Not really. Witness how many other companies showed an interest;
it wasn't even up to the level of SPARC or MIPS, though I accept
that there were other reasons than performance that dominated.

Regards,
Nick Maclaren.

Robert Myers · Aug 22, 2010

You are wrong.

Click to expand...

I give you the example of Apple's AltiVec instruction set.

Click to expand...

AltiVec is a pretty nice, clean [single precision] SIMD instruction set.

Robert.

MitchAlsup · Aug 22, 2010

You are wrong.

No, Robert is correct. After the decode stage the ISA is irrelevent
{caveat: the rest of the pipeline was not horribly screwed up.}

But I will go one step further. In the light of th modern 16-19 stage
x86 pipelines with OoO execution, reservation stations, hit under miss
caches, reorder buffers, exotic branch prediction, store to laod
forwarding,... The cost of x86 (with all of its atrocities) versus a
perfectly designed RISC ISA is on the order of 2% in architectural
figure of merit, and maybe one gate delay of pipeline cycle time.
Certainly less than 7% overall.

I give you the example of Apple's AltiVec instruction set.

An advantage so great it has been revoved from the (re)merger of Power
and Power-PC.

Mitch

Paul Gotch · Aug 22, 2010

In comp.arch MitchAlsup said:
An advantage so great it has been revoved from the (re)merger of Power
and Power-PC.

VMX (IBM call it something different as FreeScale own the AltiVec
trademark) is part of the Power ISA v2.03 and is implemented in POWER
6 and beyond, although before this IBM consistently left it out.

-p

krw · Aug 23, 2010

The "merger" was mostly marketeering. Power processors, since the Power2, I
think, used the PowerPC architecture.

VMX (IBM call it something different as FreeScale own the AltiVec
trademark) is part of the Power ISA v2.03 and is implemented in POWER
6 and beyond, although before this IBM consistently left it out.

VMX was in the '970 (Apple's G5), which was based on the Power-4.

HT-Lab · Aug 23, 2010

Brett Davis said:
You are wrong.

I give you the example of Apple's AltiVec instruction set.
AltiVec at introduction gave the PowerPC chips a 10x speed advantage
on a bunch of important graphical benchmarks, and makes the vector
processor useful in a wide variety of other tasks that are not
normally thought of as vector code. (Filesystem block allocation, etc.)

Ultimately this one innovation alone was not enough for PowerPC to
overcome all the disadvantages of competing against Intel, but it
did level the playing field for a decade.

AltiVec came from a software firm, those "real computer architects"
idea of innovation was Thumb1 and MIPS16, bunch of (CENSORED).

Brett

Not sure if relevant in this discussion but the ISA makes a huge difference to
the code density (discussed in comp.arch.embedded)

See http://www.csl.cornell.edu/~vince/papers/iccd09/iccd09.pdf

Hans
www.ht-lab.com

nmm1 · Aug 23, 2010

I call "bullshit" on you.
SPARC and MIPS do not have the spare opcode space to implement the
AltiVec permute instructions, and then there is the little issue of
Apple owning the patents.

I was referring to the number of other companies that were interested
in licensing PowerPC, let alone PowerPC+Altivec. Far more pursued
SPARC and MIPS.

Regards,
Nick Maclaren.

Nvidia Said to Take On Intel in Tablet Computer Chips

Yousuf Khan

daytripper

Yousuf Khan

Robert Myers

Yousuf Khan

Robert Myers

Yousuf Khan

Robert Myers

daytripper

Robert Myers

Sebastian Kaliszewski

Robert Myers

Robert Myers

nmm1

Robert Myers

MitchAlsup

Paul Gotch

krw

HT-Lab

nmm1