G
George Macdonald
They were for desktop micros. ;-)
Hmmm, call it 1988 for 1st product?
Rgds, George Macdonald
"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
They were for desktop micros. ;-)
Grumble said:Intel did write their own compiler for IPF, and, unsurprisingly, it is
the best compiler available for that platform, as far as I know.
I'm not sure what you mean by "difficult problems." One of the mostAs mentioned previously, I keep hearing about this "feedback loop" and,
while its importance to VLIW/EPIC seems obvious, have trouble seing how it
fits into the model for delivery of commercial software. Is every client
supplied with a compiler "free" or does the price have to be included in
the software?... or is Intel going to give compilers away to sell CPU
chips?... or any other of the various permutations for supplying the
capability? BTW I am looking at a future world where Open Source will not
displace paid-for software, especially in the domain of "difficult
problems".
From a practical standpoint, are we to believe that a re-train has to be
done for every variation on the "dataset"?
No.
How many (near) repetitions on
a given "dataset" make it worthwhile to do the re-train? Can that even be
defined?
Are you familar with the term "perfect future technology"?
Have you considered that as the complexity of the solution exceeds that of
the problem we have an umm, enigma?
They were for desktop micros. ;-)
Stuff like caches were invented first for mainframes, then for
minicomputers, and recently (historically speaking) for desktop
microprocessors.
Question: when will the first smoke-detector micro to use an L1 cache
be introduced?
Tony Hill said:Quick memory refresh needed here.. Was it the 386 or the 486 that
first introduced L1 caches into Intel's line of processors?
They were for desktop micros. ;-)
Stuff like caches were invented first for mainframes, then for
minicomputers, and recently (historically speaking) for desktop
microprocessors.
Question: when will the first smoke-detector micro to use an L1 cache
be introduced?
Mainframes had the same memory-wall issues (their memory was in a
different frame, meters away), only a decade or two earlier.
...same problems, same solutions. Smaller doesn't make them any
more clever.
I've done an awful lot of work coming up with evidence that casual
claims that you and others have made are wrong. What you have, and
not even in direct response, are more casual claims. All in the same
vein: seen it all, done it all, forget it, everybody knew everything
long ago.
Don't know how it worked out for mainframes, but OoO doesn't help with
the transaction processing problem for microprocessors. I've already
posted the citation several times, and I'm not going to go dig it up
again. It has Patterson's name on it, published about 1996. Yes,
that Patterson.
Unless I'm mistaken, mainframes always have been designed for
transaction processing. In any case, maybe you should dig up the
Patterson citation and send him an e-mail telling *him* that he's just
publishing stuff that you knew before you even started your education.
(e-mail address removed) says...
As soon as one does pipelining OoO makes sense. Mainframes have
been pipelined since forever (I'm not sure about OoO since it is
hardware expensive).
BTW, I've lost track. Are we talking about he memory wall, OoO,
or transaction processing?
Keith R. Williams said:Gee, even the much maligned Cyrix 6X86 was an OoO processor, sold
in what, 1986? Evidently Cyrix thought it was a winner, and they
weren't wrong.
In comp.sys.ibm.pc.hardware.chips Tony Hill said:Quick memory refresh needed here.. Was it the 386 or the 486 that
first introduced L1 caches into Intel's line of processors?
Nate said:486s were the first Intel x86 chips with an on chip cache. (And I don't
think any of the earlier non-Intel 386s had one.)
Many 386s had an off-chip cache by the time the 486 became reasonably
common.
In comp.sys.intel Rob Stow said:My vague recollection of some ASM programming I did umpteen million
years ago tells me that making some things work in protected mode
required me to fiddle with the "segment descriptor cache". IIRC,
that was introduced with the 80286 but it might not have been until
the 80386. This wasn't exactly a RAM cache - it had more to do with
RAM management than with caching the actual contents of the RAM.
The 80386 had no on-chip L1 cache. Cheap motherboards had no off-chip
cache, but many had anywhere from 32 KB to 256 KB.
I also recall one system where the manual said I could install
L1 cache *or* use a Weitek/80387 math coprocessor, but not both :-D
The 80486 had an 8 KB on-chip L1 cache. I /think/ it was unified.
Because of the much lower latency, the 8 KB on-chip L1 in an 80486 was
supposed to have been as good as 64 KB of off-chip L1 with an 80386.
I /think/ one of the things done to cripple a 486DX to get a 486SX
was to permanently disable the L1 cache.
80486 motherboards commonly had up to 256 KB of L2 - but there were a
few enthusiast boards with up to 1 MB and I vaguely recall reading
about a motherboard with 2 MB. Real performance nuts occasionally
also used an expensive type of SIMM that put 4 KB or 8 KB of cache
on each SIMM - wish I could remember what the heck that kind of SIMM
was called.
Keith R. Williams said:Gee, even the much maligned Cyrix 6X86 was an OoO processor, sold
in what, 1986? Evidently Cyrix thought it was a winner, and they
weren't wrong.
some made it by mistake :
[quote from linux sources i386/io.h]
* Cache management
*
* This needed for two cases
* 1. Out of order aware processors
* 2. Accidentally out of order processors (PPro errata #51)
accidentally ? thats nice
I'm not sure what you mean by "difficult problems." One of the most
active participants and a reasonably prolific publisher in this
particular area of research is Microsoft. I'm not certain, but I
having a feeling that they aren't looking at a world in which Open
Source will replace paid-for software, either. .
There are two problems that I know of with distribution of commercial
software. One problem is that the available cache may be nowhere near
to the amount of cache for which the binary was optimized, and the
other is that programs are no longer in general statically linked.
The first of the two problems (available cache size is not
predictable) is why Intel has very little choice but to take actual
run-time into conditions into account in one way or another. The
easiest way for them to do that, and the one I expect Intel to
implement first (I have no inside information whatsoever), is to use
speculative threading as a prefetch mechanism. More generally, it is
no secret that Intel is working on making Itanium out-of-order, albeit
in a limited way.
The most obvious way to get around the dll problem is static linking
of binaries. You can provide some-degree of installation choice in
the same way that auto-install programs work now.
I'm not sure, but I think you are imagining that the binary you
produce would vary tremendously with input conditions, so that one
customer allowed to continue to train and recompile the code would end
up with a very different binary from another customer that was also
alllowed to continue to train and recompile the code. Were that so,
the whole concept would make no sense at all, and the evidence is
abundant that such is not the case. Even the most unlikely of
software, like Microsoft Word, shows an incredible level of
predictability.
I haven't seen anyone talk about it in the literature, but there is no
reason I can see why a program cannot allow a certain amount of tuning
on-site with a limited optimization space. The current assumption, as
far as I know, is that commercial software will be delivered
"fully-trained."
I don't know about everyone working on this problem, but I know that
at least some people are looking beyond the single-processor problem.
If we can't manage to cope with the dataflow problem for a single
procesor, how in heaven's name are we going to get decent performance
out of thousands or hundreds of thousands of processors?
We already have supercomputers that occupy a large amount of physical
space. Speed of light limitations will mean that autonomous entities
will have to be able to figure out what to do next without waiting for
guidance from a central scheduler. It is very easy to come up with
forseeable problems that warrant the effort that is being expended.
I also recall one system where the manual said I could install
L1 cache *or* use a Weitek/80387 math coprocessor, but not both :-D
Quick memory refresh needed here.. Was it the 386 or the 486 that
first introduced L1 caches into Intel's line of processors?
A spreadsheet, of course, is nothing more than an interpretedI don't mean basic Web Browsing, Word etc. but a humonguous spreadsheet
with a Solve might qualify - basically anything where there is some compute
complexity and can benefit significantly in performance from
feedback/retrain. Academic and semi-academic institutions are often good
at algorithm theory and expression; where it's complex/difficult turn into
code, very often it needs a commercial implementation to get the best out
of it.
FFTW seems to do pretty well without help from a compiler.I've certainly seen software that, even on x86, adapts to cache sizes so
where it helps, e.g. stuff with matix block diagonal decomposition, it has
been done without special compiler aid. My gut feel is that trying to get
a compiler to handle such stuff automatically can never yield optimal
results.
I think the world is moving toward BLAS as a lingua franca forWhat I'm thinking of is a general purpose package like a Mathematical
Programming system (Extended LP if you like), where a client might have
several different problems to solve which have competely different
characteristics in terms of matrix sparsity and compute complexity. I see
a different dataset giving completely different feedback to a retrain here
and therefore different binary versions for each dataset.
People who have bought Itania for FP applications looking at ItaniumThis is where I disagree... based on my personal experience. IOW this is
the flaw from my POV - can't be done for everything and certainly not for
the stuff I'm familiar with.
This seems to go way beyond any notion of statically trained commercial
software.
George said:Not sure what you mean but the Weitek was not 80387 compatible - it was a
different coprocessor with different instruction set. In that timeframe,
the 80x87 was Cyrix's first product.
Patterson's paper showed that a P6 core was stalled 60% of the time in
on-line transaction processing. I actually exaggerrated that OoO
didn't help (actually, I just repeated the exact wording of the
paper). An in-order Alpha was stalled 80% of the time.
"same problems, same solutions. Smaller doesn't make them any more
clever."
Problem: memory wall
Solution: OoO
Old evidence that should exist if your logic (same problems, same
solutions) has any substance: OoO wouldn't have been much help in
transaction processing on mainframes, either. Patterson is so
desperate to publish that he attaches his name to old news?
You implied it had all been learned long ago on mainframes. Since
mainframes are used for transaction processing, the fact that OoO
wouldn't help for transaction processing should have been old news by
the time Patterson and his student got around to the 1996 paper.
The fact that OoO isn't all that big a help for transaction processing
may be one reason why Intel stuck with an in-order Itanium. Maybe
transaction processing was one of the applications George had in mind
when he referred to Itanium applications that are "embarrassingly
appropriate."