photonic x86 CPU design

CJT · Oct 28, 2005

Casper said:
There's no current substantial technical information available; there's a
question and answer session at:

http://www.sun.com/emrkt/campaign_docs/icee_0703/transcript-SEE-091504.pdf

In it, we're directed to more information at:

http://blogs.sun.com/ahrens

which seems to run out over a year ago.

Casper H.S. Dik · Oct 28, 2005

CJT said:
ShuChih (Q): When will ZFS be included in Solaris 10? We were told first
in late summer 2004, then early 2005,
then May 2005....
Brian Ellefritz (A): ZFS will be in Solaris 10 when we ship the product.
The current projection is the end of
calendar 2004.

Interesting; well, better do it right than early :-)

Real soon now; and as usual this will conincide with the
release of the source code on OpenSolaris.org.

Casper

CJT · Oct 28, 2005

Casper said:
Interesting; well, better do it right than early
True.

Real soon now; and as usual this will conincide with the
release of the source code on OpenSolaris.org.

How about the timing in here: ? ;-)

http://www.sun.com/nc/05q3/chat/transcript_NC05Q3_091505.pdf

(from: )

"Network Computing 05Q3 – Online Chat
Thursday, September 15, 2005"

"Q: When will ZFS be available?
Chris Ratcliffe (A): November/December for public early access - right
now we're in private beta with customers across the world,
tuning and enhancing for the public release."

Casper H.S. Dik · Oct 29, 2005

CJT said:
How about the timing in here: ? ;-)

(from: )

"Network Computing 05Q3 – Online Chat
Thursday, September 15, 2005"

"Q: When will ZFS be available?
Chris Ratcliffe (A): November/December for public early access - right
now we're in private beta with customers across the world,
tuning and enhancing for the public release."

Not too far off, I'd say

(Had to read carefully to check it referred to this November and not
last year's :-)

Casper

Bill Todd · Oct 29, 2005

Casper said:
There's no current substantial technical information available; there's a
question and answer session at:

http://www.sun.com/emrkt/campaign_docs/icee_0703/transcript-SEE-091504.pdf

which covers some of this in more detail.

Thanks - I'll check it out.

but you must understand we're all paid to sprinkle ZFS teasers in
news groups and blogs

There's enough interesting stuff hinted at that those who care about
storage should already be interested (though I'm still not convinced
that 128-bit pointers are worthwhile).

The basic model for checksumming is fairly simple: all data is interconnected
through pointers and with each pointer a checksum of the data at the end
of the pointer is stored.

That's certainly what I came up with, as a by-product of already having
settled on a no-update-in-place (though not conventionally
log-structured) approach (so every change updates the parent and writing
the checksum is therefore free; reading it is free as well, since you
have to go through the parent to get to its child).

So I've been looking for more details about the rest of ZFS to try to
decide whether I've got enough original innovation left to be worth
pursuing.

- bill

Casper H.S. Dik · Oct 29, 2005

Bill Todd said:
So I've been looking for more details about the rest of ZFS to try to
decide whether I've got enough original innovation left to be worth
pursuing.

The main other innovation which makes it different is the merging of
the volume management with the filesystems.

With ZFS, the filesytems has combined knowledge about the RAID groups
and the filesystem and knows when all bits of a RAID have been written,
so it doesn't need to suffer from certain "standard" RAID 5 problems.

Also, from the system management point of few it's so much easier to
use than LVM + fs.

Casper

Wes Felter · Oct 29, 2005

With ZFS, the filesytems has combined knowledge about the RAID groups
and the filesystem and knows when all bits of a RAID have been written,
so it doesn't need to suffer from certain "standard" RAID 5 problems.

Does ZFS support parity? So far I've only seen references to mirroring
and striping. It would seem easy to avoid the problems of RAID 5 if you
don't use parity.

Bill Todd · Oct 30, 2005

Casper said:
The main other innovation which makes it different is the merging of
the volume management with the filesystems.

With ZFS, the filesytems has combined knowledge about the RAID groups
and the filesystem and knows when all bits of a RAID have been written,
so it doesn't need to suffer from certain "standard" RAID 5 problems.

Also, from the system management point of few it's so much easier to
use than LVM + fs.

That's fine as long as you don't wish to combine file-level with
block-level storage on the same disks, but even NetApp has moved to do
so lately (and I say this as a confirmed file-system bigot who thinks
that block-level storage is so rarely preferable - given reasonable
file-level facilities such as direct I/O that bypasses any system
caching - as to be almost irrelevant these days).

Furthermore, there's still a fairly natural division between the file
layer and the block layer (which in no way limits the file system's
ability to use knowledge of the block layer to its advantage nor
requires that users be concerned about the block layer unless they want
to use it directly). And finally time-efficient user-initiated
reorganization (e.g., adding/subtracting disks or moving redundancy from
mirroring to a parity basis) and recovery from disk (or entire server)
failures dictates that the redundancy restoration on recovery proceed in
at least multi-megabyte chunks (whereas file system actively requires
much finer-grained allocation: I did find a reference to batching
updates into a single large write - another feature I've been working on
- but even that doesn't address the problem adequately a lot of the time).

Otherwise, ZFS sounds truly impressive (e.g., I'm not used to seeing
things like prioritized access outside real-time environments, let alone
deadline-based scheduling - though that's coming into more general vogue
with the increasing interest in isochronous data, assuming that's the
level of 'deadline' you're talking about).

- bill

Casper H.S. Dik · Oct 31, 2005

Wes Felter said:
On 2005-10-29 17:08:27 -0500, Casper H.S. Dik <[email protected]> said:

Does ZFS support parity? So far I've only seen references to mirroring
and striping. It would seem easy to avoid the problems of RAID 5 if you
don't use parity.

It has a form of RAID 5 (dubbed RAIDZ) which uses parity. If there's
no outright disk failure, ZFS can reconstruct which disk is returning the
bad data. Also, by not comitting the data in ZFS until all parts of the
RAID have been written, there's no chance of the "RAID5 hole" occuring.
(The RAID5 hole is where you have, e.g., a power failure, before you're
written all data + parity; the data is corrupt with no way to recover)

Casper

Bill Davidsen · Oct 31, 2005

Thomas said:
Unfortunately not -- you need five, encode A A B B A+B, and then you
can recover from any two losses. That pattern extends: two blocks on
six drives survives three failures by A A B B A+B A+B, on eight
survives four by A A A B B B A+B A+B ...

There's probably a beautiful proof, based on some intuitively obvious
property of some geometric object that's clearly the right thing to
imagine, that 4-on-2-surviving-2 doesn't work; but I ran through all
2^16 possible encodings, since getting C to count to 1<<16 is quicker
than acquiring mathematical intuition.

Seven discs can encode 4 discs of content and survive two losses; use
the Hamming code, whose minimum distance is 3 so *removing* (rather
than corrupting) two whole columns still leaves distance between the
codewords. But seven discs are getting rather loud and heavy.

And expensive. People have done backup and verify to tape for decades, a
RAID-5 array is far more reliable, and the most useful storage you can
get and still have failure tolerance (N-1 of N drives for data). If
backups are expensive or inconvenient people don't do them.

Bill Todd · Oct 31, 2005

Terje said:
Which is why I'd like to use spare desktop disk space as a distributed
backup medium: Zero incremental cost (since most of our users only use a
small part of their standard hard disk). With nothing but the network
cable needed, backups can happen automatically as long as you're connected.

Well, if you've got the network cable *anyway* (GBE isn't at all
unreasonable these days) then you're well on the way to being able to
run the desktops diskless, centralize the storage on a server that
stripes all the data across all the disks to provide near-optimal
performance (both random and streaming) plus far better space
utilization, and use spare space on the server for the backup or
investigate Sun's new ZFS to see whether a) its snapshot mechanisms will
provide reasonable backups and b) you trust its integrity enough for that.

Among other things, this significantly reduces the trust requirements
for the system (if my backup data is spread across all my co-workers'
disks then I'm trusting all of them if I ever need it, not to mention
being exposed to significantly increased possibility of losing it due to
multiple disk failures, whereas if it's on the central server I'm only
trusting the server, its authorization mechanism, and its ability to
limit exposure to multi-disk failures). Not to mention centralizing
management, providing an environment where individuals can roam among
workstations without losing access to their personal context, etc.

Or if diskless makes you uneasy just throw in a small, inexpensive disk
to boot from and cache recently-used data so that it can operate
stand-alone if necessary (works well for notebooks, too).

This doesn't constitute a return to the days of time-sharing, since the
compute power remains distributed and local. But centralizing the data
makes a lot of sense.

- bill

Terje Mathisen · Oct 31, 2005

Bill said:
And expensive. People have done backup and verify to tape for decades, a
RAID-5 array is far more reliable, and the most useful storage you can
get and still have failure tolerance (N-1 of N drives for data). If
backups are expensive or inconvenient people don't do them.

Which is why I'd like to use spare desktop disk space as a distributed
backup medium: Zero incremental cost (since most of our users only use a
small part of their standard hard disk). With nothing but the network
cable needed, backups can happen automatically as long as you're connected.

Terje

Terje Mathisen · Nov 1, 2005

Bill said:
Well, if you've got the network cable *anyway* (GBE isn't at all
unreasonable these days) then you're well on the way to being able to
run the desktops diskless, centralize the storage on a server that

I'd like the backup to work for laptops as well, they have more need of
it, and they move around a lot.

stripes all the data across all the disks to provide near-optimal
performance (both random and streaming) plus far better space
utilization, and use spare space on the server for the backup or
investigate Sun's new ZFS to see whether a) its snapshot mechanisms will
provide reasonable backups and b) you trust its integrity enough for that.

Among other things, this significantly reduces the trust requirements
for the system (if my backup data is spread across all my co-workers'
disks then I'm trusting all of them if I ever need it, not to mention

Redundancy (N out of N+M), with monitoring and migration of units that
don't check in regularly, is how I want ot handle this.

being exposed to significantly increased possibility of losing it due to
multiple disk failures, whereas if it's on the central server I'm only
trusting the server, its authorization mechanism, and its ability to
limit exposure to multi-disk failures). Not to mention centralizing
management, providing an environment where individuals can roam among
workstations without losing access to their personal context, etc.

Some good stuff there, but I've never seen it at a price point that
makes economical sense.

Or if diskless makes you uneasy just throw in a small, inexpensive disk
to boot from and cache recently-used data so that it can operate
stand-alone if necessary (works well for notebooks, too).

That small, inexpensive disk is already there, it is at least 80 GB on a
desktop and 20-40 GB on a laptop.

This doesn't constitute a return to the days of time-sharing, since the
compute power remains distributed and local. But centralizing the data
makes a lot of sense.

The same amount of disk space on a cheap SAN server costs 2+ to 10 times
what it costs in the form of embedded IDE drives distributed across a
bunch of personal computers.

The key idea though is that the PC disk space is effectively free, so
why not use it?

Terje

Stephen Fuld · Nov 1, 2005

Terje Mathisen said:
I'd like the backup to work for laptops as well, they have more need of
it, and they move around a lot.

Redundancy (N out of N+M), with monitoring and migration of units that
don't check in regularly, is how I want ot handle this.

Some good stuff there, but I've never seen it at a price point that
makes economical sense.

That small, inexpensive disk is already there, it is at least 80 GB on a
desktop and 20-40 GB on a laptop.

The same amount of disk space on a cheap SAN server costs 2+ to 10 times
what it costs in the form of embedded IDE drives distributed across a
bunch of personal computers.

The key idea though is that the PC disk space is effectively free, so
why not use it?

That is an interesting idea! One potential caveat. You are essentially
making copies of one person's data on another person's computer. If the
second person can access the data on his computer, he could presumably
access the first person's data (at least a read only copy of it). This may
not be desirable. The boss may not want his memo describing his plans for
next year's salary adjustments read by someone in his department - or
perhaps even worse, someone in another department.

Stephen Fuld · Nov 1, 2005

Bill Todd said:
Well, if you've got the network cable *anyway* (GBE isn't at all
unreasonable these days) then you're well on the way to being able to run
the desktops diskless, centralize the storage on a server that stripes all
the data across all the disks to provide near-optimal performance (both
random and streaming) plus far better space utilization, and use spare
space on the server for the backup or investigate Sun's new ZFS to see
whether a) its snapshot mechanisms will provide reasonable backups and b)
you trust its integrity enough for that.

Not to mention the disk savings from eliminating the many duplicate copies
of all the software that would otherwise be resident on each computer, and
the ease of making sure the software was curent with patches, etc.

But I suspect that the latency of the network when added to the disk latency
might be a problem. You address that with the idea of using a disk on each
"workstation" as a transparent cache for data from the server. Does any
current operating system provide for such a function? It doesnt seem to be
hard to do, and it may have been done in the past, but individual storage is
so much "in" that I don't know of any current implementations.

Anton Ertl · Nov 1, 2005

Stephen Fuld said:
....
But I suspect that the latency of the network when added to the disk latency
might be a problem.

Why do you suspect that?

I am seeing ping times to our NFS server of 0.1ms-0.5ms; disk latency
is about 10ms, so network latency is negligible.

Network bandwidth is more likely to be a problem. Even though all our
servers have Gigabit Ethernet, the networking department does not want
to invest in GE switches (they don't do cheap switches), so all our
servers are still connected by 100Mb/s, i.e., raw network bandwidth of
12MB/s, whereas modern disks provide a raw bandwidth of 50MB/s or so.

The real problem, however, when I last looked, was NFS's caching. For
local disks, the OS can cache hundreds of MBs in memory, making
warm-starting of a binary a thing that is usually not disk-bound.
With NFS and its statelessness dogma, the cache expires after a short
while (30s or so), and executing a binary again is often just as slow
as executing it the first time. There are better remote file systems
than NFS, but NFS was "good enough" and won out. Maybe NFS has been
improved in that respect in the meantime.

I worked on a diskless workstation for a year in 1991. I used it
mainly as an X-Terminal to work on the server that had the disks local
(the server also had more RAM and possibly a faster CPU, but the local
disks were probably the main reason why it seemed to be so much
faster).

You address that with the idea of using a disk on each
"workstation" as a transparent cache for data from the server. Does any
current operating system provide for such a function?

It certainly is written a lot about in the distributed file system
literature. I guess that stuff like AFS comes with such a feature,
but AFAIK AFS does not come with the OS and you have to buy it
separately. The Linux kernel supports the Coda file system client; I
don't know if any distribution supports Coda (Debian has experimental
packages for Coda).

It doesnt seem to be
hard to do,

Well the typical problems with client-side caching caching in
distributed file systems are: how do you keep the caches consistent in
the presence of write traffic? What do you do if the connection fails
or if the server or client is down. There are solutions to these
problems, with various tradeoffs, but I would not call this an easy
problems.

Followups set to comp.arch

- anton

Terje Mathisen · Nov 1, 2005

Stephen said:
That is an interesting idea! One potential caveat. You are essentially
making copies of one person's data on another person's computer. If the
second person can access the data on his computer, he could presumably
access the first person's data (at least a read only copy of it). This may
not be desirable. The boss may not want his memo describing his plans for
next year's salary adjustments read by someone in his department - or
perhaps even worse, someone in another department.

That one is obvious, all backups will of course be encrypted on the
source machine, before being distributed to the individual backup areas.

Terje

David Schwartz · Nov 1, 2005

That one is obvious, all backups will of course be encrypted on the
source machine, before being distributed to the individual backup areas.

Terje

And they need to be signed and versioned as well.

DS

Hugh Fisher · Nov 2, 2005

Terje said:
That one is obvious, all backups will of course be encrypted on the
source machine, before being distributed to the individual backup areas.

This can make it difficult to restore from if the owner of
the encryption key has died in an accident or leaves the
organization on less than happy terms.

There's also the problem that these PCs with disks are
probably in the same building, so you aren't protected
against earthquakes or other large scale disasters.

cheers,
Hugh Fisher

David Schwartz · Nov 2, 2005

Terje Mathisen wrote:

This can make it difficult to restore from if the owner of
the encryption key has died in an accident or leaves the
organization on less than happy terms.

Trivial to solve. The software simply enforces the policy configured by
the data's owner. If this means that there must be a "master key", then
there will be one.

DS

photonic x86 CPU design

CJT

Casper H.S. Dik

CJT

Casper H.S. Dik

Bill Todd

Casper H.S. Dik

Wes Felter

Bill Todd

Casper H.S. Dik

Bill Davidsen

Bill Todd

Terje Mathisen

Terje Mathisen

Stephen Fuld

Stephen Fuld

Anton Ertl

Terje Mathisen

David Schwartz

Hugh Fisher

David Schwartz