Socket 940 FPGA

  • Thread starter Thread starter Spoon
  • Start date Start date

I don't see why not, though you're going to have a heck of a time
getting any meaningful floating point performance out of a 200MHz
FPGA. Also the results would then reflect the co-processor rather
than the Opteron chip itself, unless you could somehow manage to get a
single-threaded task to run between the two chips at the same time.

Really though, this sort of thing has absolutely ZERO use for desktop
computers, workstations, servers, notebooks, or anything else that we
generally associate with the term "computer". What this WILL be
useful in is VERY specific customer computing setups embedded within
some larger device. One example that jumps to mind is an industrial
robot (ie the kind that car manufacturers use in their assembly
plants, not what you might see in The Jetsons). The Opteron chip
could then run the OS, the GUI and all the applications while the FPGA
could be used for interfacing with the mechanical parts. of course,
even here this is probably overkill, a simple microcontroller could
usually accomplish the same thing for $10 rather than this $4,500
setup. Still, the example should suffice to give you an idea of where
they're headed with this design.
 
Tony Hill said:
I don't see why not, though you're going to have a heck of a time
getting any meaningful floating point performance out of a 200MHz
FPGA. Also the results would then reflect the co-processor rather
than the Opteron chip itself, unless you could somehow manage to get a
single-threaded task to run between the two chips at the same time.

Really though, this sort of thing has absolutely ZERO use for desktop
computers, workstations, servers, notebooks, or anything else that we
generally associate with the term "computer". What this WILL be

The performance potential of FPGAs has been demonstrated for several
applications, one noteworthy algorithm being the Smith-Waterman sequence
similarity search, which is applicable in bioinformatics. Order-of-magnitude
speedups have been reported (for SW and other bioinformatics algorithms) by
many, in academia and commercially (www.timelogic.com). That's just one
example that I know of reasonably well. Off the top of my head, others
include cryptography, pattern detection (which breaks up into everything
fron DNA sequence analysis to image recognition to network intrusion
detection), I think someone even put an MD-GRAPE on an FPGA recently, but I
can't seem to find the ref now.

You're generally right about floating-point performance (for now), but there
are other demanding tasks that don't exercise the good ol' x87. Some of
those tasks would benefit tremendously from a custom architecture, and will
run quick on an FPGA in spite of the reconfigurability overhead.
 
Really though, this sort of thing has absolutely ZERO use for desktop
computers, workstations, servers, notebooks, or anything else that we
generally associate with the term "computer".

Not really. If it were cheaper, for example, it would let chess-playing
computer programs search ahead more ply in the same amount of time.

John Savard
http://www.quadibloc.com/index.html
_________________________________________
Usenet Zone Free Binaries Usenet Server
More than 140,000 groups
Unlimited download
http://www.usenetzone.com to open account
 
I don't see why not, though you're going to have a heck of a time
getting any meaningful floating point performance out of a 200MHz
FPGA.

That depends completely on the algorithm. FPGAs can do some rather
impressive arithmetic. Think parallel and pipelined.
Also the results would then reflect the co-processor rather
than the Opteron chip itself, unless you could somehow manage to get a
single-threaded task to run between the two chips at the same time.

No different than any other asymmetric co-processor.
Really though, this sort of thing has absolutely ZERO use for desktop
computers, workstations, servers, notebooks, or anything else that we
generally associate with the term "computer".

I disagree. I can see a use in oil exploration, for instance.
Maybe finance/brokerages. As was mentioned above, FPGAs are very
good at pattern matching, forier analysis, filtering, all that sort
of thing.
What this WILL be
useful in is VERY specific customer computing setups embedded within
some larger device. One example that jumps to mind is an industrial
robot (ie the kind that car manufacturers use in their assembly
plants, not what you might see in The Jetsons). The Opteron chip
could then run the OS, the GUI and all the applications while the FPGA
could be used for interfacing with the mechanical parts.

The I/O on the FPGA aren't customer accessible are they? I thought
it fit in a bog-standard opteron socket on a standard board. No
user I/O there.
of course,
even here this is probably overkill, a simple microcontroller could
usually accomplish the same thing for $10 rather than this $4,500
setup. Still, the example should suffice to give you an idea of where
they're headed with this design.

A "simple microcontroller" isn't going to keep up with an FPGA.
 
That depends completely on the algorithm. FPGAs can do some rather
impressive arithmetic. Think parallel and pipelined.

Perhaps the original poster could somehow solder a Socket 940
onto a PlayStation 3 and get 200+ gigaflops, as long as the
task would neatly fit into the local memory of each processor.

I was interested in getting a bit of my code to run on one of
those but thus far have not found any information on a way to
do that.
 
On the floting point side, FPGAs are pretty weak and use up
disproportionate resources to achieve modest FP capacity, but then
again if you have funny math, or FP that is designed around unusual
widths outside IEEE, it gets relatively better since you would have to
emulate that in SW. The biggest gains in performance in FPGA over cpus
are in those areas that FPGAs are naturally parallel v software
emulating same hardware. When FPGAs emulate optimal software on cpus,
they generally lose.

A trip to comp.arch.fpga tells you pretty well what FPGAs get used for
now, DSP and embedded are most of it. The market for these 940 modules
and the market for Cray, SRC reconfigurable computing with Opteron
systems are not well served by that group or this group being too
focussed on EE issues or Pentium,Alpha homework, issues.

There has been some discussion about setting up comp.arch.reconfig
precisely for the RC algorithm and computing side use of FPGAs in
whatever form they take, the original charter of c.a.f. If this ever
took off, I would expect alot of the parallel software people to be
interested too.

My own take is that however FPGAs & Opterons are packaged, there is a
computing impedance mismatch between the two systems over such a
relatively narrow HT pipe. I would suggest developing the RC algorithms
around a softcore processor with enhanced app specific instructions or
co processor designed as a pair and then array that through the FPGA.
Such a pair can use up good amounts of logic & BlockRam to form a
custom PE with more limited but parallel PE to PE interconnect. Ie
follow the hierarchy that FPGAs already use. It isn't even neccesary to
build the cpu since the Xilinx V2 Pro & V4 include upto 4 PPC cores
that can be used as local controllers or be customized by adding on
local coprocessor. These only run at 300Mhz or so but thats in line
with the possible pipeline speeds and next door to the actual logic.

The other issue is tools and languages, we have been stuck in the C/C++
sequential model for SW development and on HW side we have Verilog/VHDL
and never the twain shall meet, either the languages or the engineers
that use them (some exceptions). When one thinks only in terms of
communicating proceesses, it is alot easier to see that a common
language could be used for both HW & SW design over the RC space and
that such languages will be useful for describing parallel algorithms
that can run as code on regular cpu or be synthesizeable to HW. Occam
has been there done that, now looks like HandelC. We also see alot of C
based HDLs around trying to achieve the same thing but those don't have
access to HW synthesis. My preference would be to take a subset of
Verilog modified back to C syntax that supports parallel processes as
modules with ports and map that over a class declaration.

struct + methods -> class {},
class + ports & verilog subset (always, assign,,) -> process.(..) {..}.

Such a language will be very familiar to both C and Verilog user and
would be synthesizeable with minor syntax change back to Verilog. The
runtime for such a language would also borrow much from HDL simulators.
The scheduler might even be part of a cpu in its own right.

my 2 bits

John Jakson
transputer guy
 
Don said:
Perhaps the original poster could somehow solder a Socket 940
onto a PlayStation 3 and get 200+ gigaflops, as long as the
task would neatly fit into the local memory of each processor.

I was interested in getting a bit of my code to run on one of
those but thus far have not found any information on a way to
do that.
On a playstation 3? That is a little hard to come by. On a cell
processor, http://www.alphaworks.ibm.com/topics/cell might get you started.
 
Del Cecchi said:
Don Taylor wrote: ....
On a playstation 3? That is a little hard to come by. On a cell
processor, http://www.alphaworks.ibm.com/topics/cell might get you started.

I belive I understand.

I have an assortment of information on the Cell processor.
But whether it will be possible for ordinary folks to be
able to run a little code on the PS3, without being a game
company or otherwise privledged with access to development
systems, seems more difficult to determine.

With a stack of PS3's some algorithms become interesting.
And it appears that they will have to sell these for less
than the cost of the parts, let along manufacturing costs.
 
I belive I understand.

I have an assortment of information on the Cell processor.
But whether it will be possible for ordinary folks to be
able to run a little code on the PS3, without being a game
company or otherwise privledged with access to development
systems, seems more difficult to determine.

Why do you think "ordinary folks" could program an FPGA effectively?
"Ordinary folks" are hardly the point here. This is an interesting
product for any number of reasons. ...and no I'm not buying one, though
have doen FPGA designs.
With a stack of PS3's some algorithms become interesting. And it appears
that they will have to sell these for less than the cost of the parts,
let along manufacturing costs.

Have they actually sold one yet?
 
That depends completely on the algorithm. FPGAs can do some rather
impressive arithmetic. Think parallel and pipelined.

Sure, there are definitely situations where you can get rather
impressive performance, but you'll have a heck of a time getting that
sort of performance most of the time. Optimizing for one fairly
specific set of calculations is one thing, but integrating it as a
fairly general co-processor on a PC for something like SPECfp? That's
another story.
No different than any other asymmetric co-processor.

Yup. Possible but almost never trivial at high levels of
optimization.
I disagree. I can see a use in oil exploration, for instance.
Maybe finance/brokerages. As was mentioned above, FPGAs are very
good at pattern matching, forier analysis, filtering, all that sort
of thing.

Computing farm work, maybe. Though even there I think most people
will find that there are better and/or cheaper solutions available,
especially if they have to hire people to program those FPGAs.
A "simple microcontroller" isn't going to keep up with an FPGA.

That's kind of my point, the FPGA is a much higher performance but
also MUCH more expensive solution, hence the fact that it's overkill.
 
Sure, there are definitely situations where you can get rather
impressive performance, but you'll have a heck of a time getting that
sort of performance most of the time. Optimizing for one fairly
specific set of calculations is one thing, but integrating it as a
fairly general co-processor on a PC for something like SPECfp? That's
another story.

The whole purpose of this widget is for very specific applications.
An FPGA will never beat custom logic, but how many people can
afford to cast their problem into custom logic?
Yup. Possible but almost never trivial at high levels of
optimization.

Trivial? I don't believe anyone said anything about triviality.
FPGAs aren't trivial beasts to program.
Computing farm work, maybe. Though even there I think most people
will find that there are better and/or cheaper solutions available,
especially if they have to hire people to program those FPGAs.

Exactly compute farm work. Show me a cheap "coprocessor" of any
stripe. The point here is that one *can* design a coprocessor for
specific tasks. Yes, FPGA designers are expensive (wanna hire
one?;) but so are some of these problems.
That's kind of my point, the FPGA is a much higher performance but
also MUCH more expensive solution, hence the fact that it's overkill.

How can you say that it's overkill? You haven't even defined your
problem. I can see a ton of applications for these things.
 
Keith said:
Why do you think "ordinary folks" could program an FPGA effectively?
"Ordinary folks" are hardly the point here. This is an interesting
product for any number of reasons. ...and no I'm not buying one, though
have doen FPGA designs.




Have they actually sold one yet?
ps3 not in GA yet.
 
The whole purpose of this widget is for very specific applications.
An FPGA will never beat custom logic, but how many people can
afford to cast their problem into custom logic?

Hehe, I think we're arguing the same point again here Keith!
Trivial? I don't believe anyone said anything about triviality.
FPGAs aren't trivial beasts to program.

Exactly, and that's why I really don't see someone dropping such a
beast into a desktop/workstation type PC to run SPECfp on it.
Exactly compute farm work. Show me a cheap "coprocessor" of any
stripe. The point here is that one *can* design a coprocessor for
specific tasks. Yes, FPGA designers are expensive (wanna hire
one?;) but so are some of these problems.

There definitely are some situations where such a setup is going to be
VERY useful, but they're definitely going to be rather niche projects
and they aren't likely to look much of anything like a desktop PC.
That was all I was trying to say all along.
 
There definitely are some situations where such a setup is going to be
VERY useful, but they're definitely going to be rather niche projects
and they aren't likely to look much of anything like a desktop PC.
That was all I was trying to say all along.

Question :P

If they can make something whatever-it-is a drop into any Socket 940
system, wouldn't it be similarly easy to drop a
non-programmable-whatever-mass-produced-fpu/sse-thingy in the same way
that would make a lot more sense for a PC market?
 
Question :P

If they can make something whatever-it-is a drop into any Socket 940
system, wouldn't it be similarly easy to drop a
non-programmable-whatever-mass-produced-fpu/sse-thingy in the same way
that would make a lot more sense for a PC market?

Yes... err, sorta. They COULD make some sort of ASIC/co-processor
thing that would drop into Socket 940 and work alongside an Opteron
chip, Hypertransport was designed to be rather flexible in that
regards.

However the real tricky part to this is the second part of your
question, ie making a lot more sense for a PC market. In order for
such a chip to make sense for the PC market it would have to run PC
software sufficiently faster to justify the (relatively high) cost
involved with doing this. Such a chip is likely to be more expensive
then simply adding in another Opteron processor, which is no easy feat
even if you are just looking at specific bits of code.

FWIW there have been a few suggestions that those people are
ClearSpeed (www.clearspeed.com) might do something very similar to
what you're proposing, with a hypertransport/Socket 940 (or Socket
AM2, or whatever the next-generation Opteron socket is) version of
their parallel processor. There are definitely some high performance
computing situations where this setup would be VERY attractive, maybe
even some workstation setups if they can get enough software support.
I doubt that it will make it down to the desktop PC level, but I
suppose anything is possible.
 
In comp.sys.ibm.pc.hardware.chips The little lost angel said:
If they can make something whatever-it-is a drop into any
Socket 940 system, wouldn't it be similarly easy to drop a
non-programmable-whatever-mass-produced-fpu/sse-thingy in the
same way that would make a lot more sense for a PC market?

Maybe a GPU in the second s940?

But the bandwidth demands of vram are kinda high, and why
congest the Hypertransport? Separate card is probably better.

-- Robert
 
Maybe a GPU in the second s940?

A GPU only makes sense if it's connected to the display unit. Since there
is already a channel for the GPU...
But the bandwidth demands of vram are kinda high, and why congest the
Hypertransport? Separate card is probably better.

....as it is.
 
Back
Top