Stephen Fuld said:
....
But I suspect that the latency of the network when added to the disk latency
might be a problem.
Why do you suspect that?
I am seeing ping times to our NFS server of 0.1ms-0.5ms; disk latency
is about 10ms, so network latency is negligible.
Network bandwidth is more likely to be a problem. Even though all our
servers have Gigabit Ethernet, the networking department does not want
to invest in GE switches (they don't do cheap switches), so all our
servers are still connected by 100Mb/s, i.e., raw network bandwidth of
12MB/s, whereas modern disks provide a raw bandwidth of 50MB/s or so.
The real problem, however, when I last looked, was NFS's caching. For
local disks, the OS can cache hundreds of MBs in memory, making
warm-starting of a binary a thing that is usually not disk-bound.
With NFS and its statelessness dogma, the cache expires after a short
while (30s or so), and executing a binary again is often just as slow
as executing it the first time. There are better remote file systems
than NFS, but NFS was "good enough" and won out. Maybe NFS has been
improved in that respect in the meantime.
I worked on a diskless workstation for a year in 1991. I used it
mainly as an X-Terminal to work on the server that had the disks local
(the server also had more RAM and possibly a faster CPU, but the local
disks were probably the main reason why it seemed to be so much
faster).
You address that with the idea of using a disk on each
"workstation" as a transparent cache for data from the server. Does any
current operating system provide for such a function?
It certainly is written a lot about in the distributed file system
literature. I guess that stuff like AFS comes with such a feature,
but AFAIK AFS does not come with the OS and you have to buy it
separately. The Linux kernel supports the Coda file system client; I
don't know if any distribution supports Coda (Debian has experimental
packages for Coda).
It doesnt seem to be
hard to do,
Well the typical problems with client-side caching caching in
distributed file systems are: how do you keep the caches consistent in
the presence of write traffic? What do you do if the connection fails
or if the server or client is down. There are solutions to these
problems, with various tradeoffs, but I would not call this an easy
problems.
Followups set to comp.arch
- anton