In comp.sys.ibm.pc.hardware.chips The little lost angel said:
I had the thought once that if we could use a lot a lot of
386/486 chips connected together, we might be able to make a
very very powerful computer with much faster interconnect than
say via clustering individual PC. But it just quite stuck in my
head, despite probably being explained to me, that individual &
interdependent instruction latency would really kill performance.
You are not wrong, but the ways in which you are right might
surprise you. 386s & 486s aren't very good for multiprocessing due
to a lack of cache and cache-coherency (MOESI) hardware. So it's
tough for them to share RAM. But they can of course be clustered.
These can make extremely powerful machines, but they are equally
hard to program. The problem has to be parallizeable, like brute
force searching crypto keyspace. Shared memory (first really
possible on the Pentium) improves things somewhat because a single
OS can manage processes. But not always since memory bandwidth
must be shared. Some problems solve faster on clusters.
Latency really becomes an important issue in essentially
single threaded problems. Which a surprising amount of
computing traditionally has been: do this, then do that.
When decision points are lacking, it can become: do this,
the other will do that. Some graphics fits.
Radium's limit-testing case of billions of 1Hz CPU made it very very
clear why it's not going to be good for general purpose computing due
to the sheer extremeness of the latency. 1 sec is rather significant
even on a human scale whereas 4.7us is such a tiny number I dismissed
as insignificant.
For a simpler human scale: Can nine women make [gestate]
a baby in one month?
Even with zero communications/memory latency and perfect
interleaving, the giga-wide 1 Hz machine still takes time
to do instructions. The results of one aren't available to
others until it's up. Many real pgms are millions/billions
of instructions long in a single thread.
This is why we have clock-speed competition. Dual processors can
help personal computing (not usually as much as double clock!) but
it's a story of diminishing returns. People just aren't doing
that much simultaneously in parallel. Unlike servers.
I've been using dualies a my primary machine for 8 years. They seem
smoother, perhaps because one of the CPUs is usually idle and can
handle interrupts or background tasks. But I have to do special
things (make -j 2) to get a speed increase on compute intensive
tasks like a Linux kernel compile, and at best it reaches about 98%
of the speed possible on a single double-clock CPU. Still when CPUs
run of clock (heat/pwr budget), then parallel is the only way left.
-- Robert