photonic x86 CPU design

Terje Mathisen · Nov 2, 2005

David said:
And they need to be signed and versioned as well.

Sure.

And the directory info must also be stored redundantly, including a copy
of the master key(s) on a medium you control, like a CD-R) if you
actually want to restore after a disaster wipes out your machine.

'The devil is in the details.'

Terje

Terje Mathisen · Nov 2, 2005

Hugh said:
This can make it difficult to restore from if the owner of
the encryption key has died in an accident or leaves the
organization on less than happy terms.

As I wrote in a separate reply, meta info also needs secure storage, I
suggested burning a CD-R with a copy of the master key(s).

You can also use the same setup as we use for encrypted hard drives on
all laptops:

There are two master keys for each machine, one which is owned by the
user, and changed regularly (monthly), and another which is stored in a
safe in the security department.

There's also the problem that these PCs with disks are
probably in the same building, so you aren't protected
against earthquakes or other large scale disasters.

Use traceroute and/or a network map to spread the backups across
machines in (mostly) separate buildings?

Terje

Zak · Nov 2, 2005

Stephen said:
There is an existing hardware implementation that uses two "correction"
drives to correct up to any two failing drives in an array of 16 drives (I'm
not sure if the math limits the array size to 16) . But it does assume that
you know which drive failed (almost always a good assumption).

Linux does it in software, called Raid 6.

Netapp calls it RAID DP - teh D being Dual or Diagonal. NetApp's is
devilishly clever and simple - it involves making a normal RAID 4, with
a single parity disk, and doing plain diagonal parity across those
disks: block 0 from disk 0, block 1 from disk 1, block 2 from disk 2,
and block 3 from parity would end up on the Dparity disk - except that
every DP diagonal skips a disk.

The interesting thing is that reconstructing only takes XOR operations -
just handled in the right order.

http://www.netapp.com/tech_library/ftp/3298.pdf

Knowing which disk has failed or finding out about adressing failures
does not require an extra disk. One often sees block sizes larger than
512 on this for this purpose. I assume that it is even possible to put
the parity from one disk into the spare space of another drive's sector
that is nearby stripe-wise.

Thomas

Casper H.S. Dik · Nov 2, 2005

Zak said:
Knowing which disk has failed or finding out about adressing failures
does not require an extra disk. One often sees block sizes larger than
512 on this for this purpose. I assume that it is even possible to put
the parity from one disk into the spare space of another drive's sector
that is nearby stripe-wise.

That doesn't help against a number of observed failure modes; even
if you store checksum *and* a block address which each block, you still
haven't addressed "phantom writes" (disk agrees to write a block and
forgets about it)

And blocksizes != 512 are cumbersome to use: they require reformatting
all disks which takes *hours*.

Casper

Bill Davidsen · Nov 3, 2005

Zak said:
Linux does it in software, called Raid 6.

I'm tempted to say "not really," it handles 1+1 but not two. That is, if
one drive fails, it will be rebuilt on the distributed hot spare. Then
is another drive fails the array will continue to operate in degraded
mode with no data loss. That's 1+1. AFAIK it will NOT handle the case
where two drives fail at the same time.

Netapp calls it RAID DP - teh D being Dual or Diagonal. NetApp's is
devilishly clever and simple - it involves making a normal RAID 4, with
a single parity disk, and doing plain diagonal parity across those
disks: block 0 from disk 0, block 1 from disk 1, block 2 from disk 2,
and block 3 from parity would end up on the Dparity disk - except that
every DP diagonal skips a disk.

The interesting thing is that reconstructing only takes XOR operations -
just handled in the right order.

http://www.netapp.com/tech_library/ftp/3298.pdf

Knowing which disk has failed or finding out about adressing failures
does not require an extra disk. One often sees block sizes larger than
512 on this for this purpose. I assume that it is even possible to put
the parity from one disk into the spare space of another drive's sector
that is nearby stripe-wise.

One advantage of the Linux software raid is that the stripe size isn't
directly related to the sector size, and it can be made LARGE to
eliminate having to seek a lot of drives to read a single item.

Stephen Fuld · Nov 3, 2005

Zak said:
Linux does it in software, called Raid 6.

Netapp calls it RAID DP - teh D being Dual or Diagonal. NetApp's is
devilishly clever and simple - it involves making a normal RAID 4, with a
single parity disk, and doing plain diagonal parity across those disks:
block 0 from disk 0, block 1 from disk 1, block 2 from disk 2, and block 3
from parity would end up on the Dparity disk - except that every DP
diagonal skips a disk.

The interesting thing is that reconstructing only takes XOR operations -
just handled in the right order.

http://www.netapp.com/tech_library/ftp/3298.pdf

I read that document, and it seems to me to be a little misleading. There
are statements in the justification of the benefits of being able to repair
a double failure that are misleading.

But those don't effect the details of the implementation. In that area, it
is a straightforward "horizontal and vertical" parity scheme as used in old
tape drives (except they make the "vertical" parity diagonal, which helps
even the workload in the same way that RAID 5 does over RAID 4. This works,
of course, but it doesn have some drawbacks.

One is the requirement to double the number of parity disks, which makes it
more expensive (this is true of most such schemes, of course), but with
small raid group sizes typically used in such servers it can become
significant. For example, a 4+1 RAID group requires 25% overhead, but 4+2
is up to a 50% overhead. For this reason, NetApp suggests in the paper
going to larger RAID group sizes.

The other issue is performance. The paper claims a 2-3% performance cost
due to the extra writes to the second parity disk. NetApp uses a
proprietary file system to eliminate the penalty of doing small writes
traditionally associated with RAID 4 or 5. This allows them to do "full
stripe" writes all the time. I don't understand how that extends to the
second parity disk. That disk has the parity of a different set of drives
so cannot be included in the parity calculation already done for the first
parity set. So unless they have extended the system to write the whole set
of participating drives, i.e. all the horitontal parity groups at the same
time, I don't see how they avoid extra reads for the second parity parity
drive.

Can anyone shed some light on this.

Colonel Forbin · Nov 14, 2005

Which is why I'd like to use spare desktop disk space as a distributed
backup medium: Zero incremental cost (since most of our users only use a
small part of their standard hard disk). With nothing but the network
cable needed, backups can happen automatically as long as you're connected.

The problem with this notion is that people sometimes NEED that space.

I recently diagnosed a huge performance problem with a HP-UX database
server running Oracle. The problem was that they hadn't altered the
default 50% upper limit on the HP-UX filesystem buffer cache. Just
one example of invalid assumptions about how "unused" resources might
be leveraged in some JIT-gone-mad world.

Also, assuming ownership over a resource merely because it is unused
is a very dangerous proposition. Wars are regularly fought over such
issues.

I am often shocked at the rather cavalier way that many people nowadays
just sort of assume that some nebulous collective waste factor will
absorb their entropy.

Colonel Forbin · Nov 14, 2005

This doesn't constitute a return to the days of time-sharing, since the
compute power remains distributed and local. But centralizing the data
makes a lot of sense.

Ah, yes. 1980's dickless workstations reborn just because we have an
order of magnitude improvement in network bandwidth.

The collapse of Communism might offer a clue here, with respect to this
mass grid computing idealism.

Indeed, it would be nice to have someone in the United States make
a computer where the CPU wasn't a PoP/CoW.

In recently reviewing the specs on the IBM 7030 I was rather shocked
to see that at least some of them were not far off from, for instance,
a current HP Superdome, particularly some of the latency numbers.

Colonel Forbin · Nov 14, 2005

The key idea though is that the PC disk space is effectively free, so
why not use it?

The key idea is that unutilized human labor is effectively free, so
why not use it?

Answer: You didn't pay for it, and it doesn't belong to you.

At a PPOE, we had a guy who was fired and prosecuted for installing
SETI on a group of government owned servers, very impressive in
computing power. As these were not at the time being utilized
during the weekends, he felt that it was acceptable for him to
"donate" that capacity to SETI in order to earn a few notches
in his beanie propeller.

If I drive my car to work and it is parked in the lot during the
day, that doesn't give you the right to steal it and operate a
free taxi service, even if you return it undamaged before I get
off work. Even assuming you don't add any extra wear.

This whole "grid" notion just distracts the industry from the
fact that we need to build a really kickass monolithic CPU with
the power to solve problems that are not embarrassingly parallel.

Instead, we just want to steal everyone else's computational
or storage capacity which in our view is "underutilized."

In the words of Phil Slackmeyer, Investment Banker, "Ethics:
A powerful negotiating tool."

Bill Todd · Nov 14, 2005

Colonel said:
Ah, yes. 1980's dickless workstations reborn just because we have an
order of magnitude improvement in network bandwidth.

More like two orders of magnitude more bandwidth (bringing it up to
considerably more per link than a single disk can deliver), plus
inexpensive servers to provide it at latencies which are minor compared
to that of the disk's access time (and when it comes to storage at this
level, access time and bandwidth are pretty much the whole story, a
story which can now be delivered effectively despite being a modest
distance from your keyboard).

Perhaps you need to remove your head from the '80s (or wherever it may
be stuck such that your view is so obscured) and develop a clue about
how the situation has changed in the interim. While you're at it you
might try to learn something about the drawbacks of dealing with
individual local PC storage as well and the advantages of sharable data.

- bill

David Kanter · Nov 14, 2005

In the words of Phil Slackmeyer, Investment Banker, "Ethics:

A powerful negotiating tool."

Surely, you should know that investment bankers have no ethics...

DK

Ken Hagan · Nov 14, 2005

Colonel said:
Also, assuming ownership over a resource merely because it is unused
is a very dangerous proposition. Wars are regularly fought over such
issues.

Indeed they are, with good reason, but I didn't read Terje's proposal
that way.

Firstly, I took it as read that one wouldn't be *able* to write to
a remote drive unless the owner had agreed to participate. Secondly,
I took it as read that there would have to be some redundancy to
cater for those who are out of the building when you need a file.
That being the case, any "remotely cached" files on your system
could be dropped if you (the owner) ever needed the space.

So (perhaps counter-intuitively) you don't lose any of the "unused"
disk space on your laptop at all. What you lose (because you joined
the scheme) is the network bandwidth and CPU time required to run the
software. In both cases, one could prioritise things so as to minimise
that.

Still, I'm unconvinced that the scheme wouldn't be best deployed with
a few centrally managed whacking great servers being the only machines
volunteering disk space.

David Hopwood · Nov 14, 2005

Ken said:
Indeed they are, with good reason, but I didn't read Terje's proposal
that way.

Firstly, I took it as read that one wouldn't be *able* to write to
a remote drive unless the owner had agreed to participate. Secondly,
I took it as read that there would have to be some redundancy to
cater for those who are out of the building when you need a file.
That being the case, any "remotely cached" files on your system
could be dropped if you (the owner) ever needed the space.

So (perhaps counter-intuitively) you don't lose any of the "unused"
disk space on your laptop at all. What you lose (because you joined
the scheme) is the network bandwidth and CPU time required to run the
software. In both cases, one could prioritise things so as to minimise
that.

Still, I'm unconvinced that the scheme wouldn't be best deployed with
a few centrally managed whacking great servers being the only machines
volunteering disk space.

It's the same code and protocols, regardless of what subset of machines
are used as servers and what subset as clients.

Ken Hagan · Nov 14, 2005

David said:
It's the same code and protocols, regardless of what subset of machines
are used as servers and what subset as clients.

Yes, but the case I described is a centrally managed backup service,
and Terje's suggestion is neither the only nor a terribly good way
to skin that cat.

Rich Walker · Nov 14, 2005

Ah, yes. 1980's dickless workstations reborn just because we have an
order of magnitude improvement in network bandwidth.

More like migrating the google filesystem onto your LAN as a way of
soaking up those spare GB of disk space you get on the cheap modern
64-bit 2+GHz 2+GB GigE 2x400GB workstations.

You've *got* local storage. No "swapping to remote file server" cr@p.
You've also got reliable well-backed-up storage on the main servers.

But in an office with 10 or so workstations, you've got 4TB of storage
you could make use of.

So you ensure that each file is stored on several machines, and several
machines maintain indices. You should get at least 1TB of useful, robust
storage.
[snip]

In recently reviewing the specs on the IBM 7030 I was rather shocked
to see that at least some of them were not far off from, for instance,
a current HP Superdome, particularly some of the latency numbers.

Although I presume the throughput numbers were a bit different :->

cheers, Rich.

Bill Davidsen · Nov 14, 2005

Bill said:
Perhaps you need to remove your head from the '80s (or wherever it may
be stuck such that your view is so obscured) and develop a clue about
how the situation has changed in the interim. While you're at it you
might try to learn something about the drawbacks of dealing with
individual local PC storage as well and the advantages of sharable data.

Cost vs. reliability with a touch of security involved in that equation,
some of which I suspect are minimal factors for you so you are ignoring
them for others.

Network storage makes the storage hardware and every part of the network
(including network administration) a chain of points of failure, and
points of attack on security. That means that not all cases have the
same solution, "least capatal expense" isn't the same as lowest TCO,
particularly if the cost of downtime or data exposure is high.

The cost of a PC class system is not much more than the cost of
terminals, because fewer people use terminals. Having independent
operating capability doesn't prevent network backup or data sharing, and
it does offer low cost and consistent response.

System configuration has a lot of answers, not just one. Oh, and a
balance of drawbacks as well, as you note.

Del Cecchi · Nov 14, 2005

Colonel said:
Ah, yes. 1980's dickless workstations reborn just because we have an
order of magnitude improvement in network bandwidth.

The collapse of Communism might offer a clue here, with respect to this
mass grid computing idealism.

Indeed, it would be nice to have someone in the United States make
a computer where the CPU wasn't a PoP/CoW.

In recently reviewing the specs on the IBM 7030 I was rather shocked
to see that at least some of them were not far off from, for instance,
a current HP Superdome, particularly some of the latency numbers.

pop/cow? care to interpret?

Del Cecchi · Nov 14, 2005

Colonel said:
The key idea is that unutilized human labor is effectively free, so
why not use it?

Answer: You didn't pay for it, and it doesn't belong to you.

At a PPOE, we had a guy who was fired and prosecuted for installing
SETI on a group of government owned servers, very impressive in
computing power. As these were not at the time being utilized
during the weekends, he felt that it was acceptable for him to
"donate" that capacity to SETI in order to earn a few notches
in his beanie propeller.

If I drive my car to work and it is parked in the lot during the
day, that doesn't give you the right to steal it and operate a
free taxi service, even if you return it undamaged before I get
off work. Even assuming you don't add any extra wear.

This whole "grid" notion just distracts the industry from the
fact that we need to build a really kickass monolithic CPU with
the power to solve problems that are not embarrassingly parallel.

Sure, that is why Cray is making the big bucks. :-)

Instead, we just want to steal everyone else's computational
or storage capacity which in our view is "underutilized."

In the words of Phil Slackmeyer, Investment Banker, "Ethics:
A powerful negotiating tool."

Sorry, the computers belong to the company and if the company wants them
in the grid to work on the companies problems then that is where they
will be. I may call it "My Workstation" but it is really IBM's and if
they want to run simulation jobs on it, they can.

Tony Hill · Nov 15, 2005

The key idea is that unutilized human labor is effectively free, so
why not use it?

Answer: You didn't pay for it, and it doesn't belong to you.

Err, maybe I'm being a bit slow here, but I'm not seeing the
correlation here. If I'm the head of the IT dept. (or at least the
person who controls the checkbook), then I *DID* pay for the PCs and
they *DO* belong to me. The computer sitting on the desk of Joe in
widget-making does not belong to him, it belongs to the company.

At a PPOE, we had a guy who was fired and prosecuted for installing
SETI on a group of government owned servers, very impressive in
computing power. As these were not at the time being utilized
during the weekends, he felt that it was acceptable for him to
"donate" that capacity to SETI in order to earn a few notches
in his beanie propeller.

Exactly the point, they aren't HIS computers, even if he might have
been the primary user of them. The computers are owned by the
government dept. in this case, and if that government department
decides to use them for something, that is their choice.

As I said above, maybe I'm missing something here, but I don't think
this discussion was ever about some rogue down in IT taking over
everyone's computer as a means of making his/her job easier, but
rather a corporate decision on how to manage their IT resources.

Terje Mathisen · Nov 15, 2005

Colonel said:
The key idea is that unutilized human labor is effectively free, so
why not use it?

Answer: You didn't pay for it, and it doesn't belong to you.

Huh?

Please read my proposal instead of coming up with a strawman attack.

What I suggested was a way for a group of people/PCs to all donate some
local disk space as backup storage for the rest of the group.

Yes, you could get into a 'tragedy of the commons' situations, but it
would be easy to require that the space you donate is at least 1.x times
the space you require for (redundantly stored) backups.

At a PPOE, we had a guy who was fired and prosecuted for installing
SETI on a group of government owned servers, very impressive in
computing power. As these were not at the time being utilized
during the weekends, he felt that it was acceptable for him to
"donate" that capacity to SETI in order to earn a few notches
in his beanie propeller.

If I drive my car to work and it is parked in the lot during the
day, that doesn't give you the right to steal it and operate a
free taxi service, even if you return it undamaged before I get
off work. Even assuming you don't add any extra wear.

OTOH, if you fly from San Jose to Portland one morning, and some other
person is doing the opposite trip, then you could arrange up front to
swap cars for the day.

Nothing is stolen, nothing is used without permit.

Terje

photonic x86 CPU design

Terje Mathisen

Terje Mathisen

Zak

Casper H.S. Dik

Bill Davidsen

Stephen Fuld

Colonel Forbin

Colonel Forbin

Colonel Forbin

Bill Todd

David Kanter

Ken Hagan

David Hopwood

Ken Hagan

Rich Walker

Bill Davidsen

Del Cecchi

Del Cecchi

Tony Hill

Terje Mathisen