Cleversafe

  • Thread starter Thread starter Wendell III
  • Start date Start date
Previously Wendell III said:
Has anyone looked at/used Cleversafe for anything serious?

Thanks,
-Wendell
--

It has the usual problems of distributes, unmanaged storage:

- You need to contribute twice as much as you get, bot in storage
space and in bandwidth.
- You need to have enough upstream bandwith.
- 6 out of 11 is not too reliable, especially for longer-term
storage.
- Not clear how long this will stay operational.

I used to do a bit of research in the area, but concluded that the
idea, while seemingly attractive, does not work well and does
not make economic sense. This is one of these stupid things
that results when people take several concepts, here Internet and
storage, and try to merge them at all cost.

Arno
 
Arno said:
Previously Wendell III (e-mail address removed) wrote:-
Has anyone looked at/used Cleversafe for anything serious?-
-
http://www.cleversafe.org/-
-
Thanks,
-Wendell
-- -

It has the usual problems of distributes, unmanaged storage:

- You need to contribute twice as much as you get, bot in storage
space and in bandwidth.
- You need to have enough upstream bandwith.
- 6 out of 11 is not too reliable, especially for longer-term
storage.
- Not clear how long this will stay operational.

I used to do a bit of research in the area, but concluded that the
idea, while seemingly attractive, does not work well and does
not make economic sense. This is one of these stupid things
that results when people take several concepts, here Internet and
storage, and try to merge them at all cost.

Arno


A couple of comments: The 11 lose 5 scenario means that you'd need t
sustain 5 simultaneous node failures to lose any data. The odds o
that happening are slim - in fact, it creates a "twelve 9" availabilit
situation -- far more reliable than any other storage solution today.

The blowup - i.e., the amount of total storage needed relative to th
original data set - is 2.1x the original data set size in the 11 lose
scenario. While this might seem high, it's significantly less than th
number of copies of data that companies typically make to ensure tha
their data is available when they want to access it. It's generall
accepted that high availability environments create 4-10x the origina
data set size.

And, we actually are working on a yet-to-be-released version which wil
reduce the blowup to ~1.3x.

Finally, with people and companies wanting to keep data around for
long time on secure, cost-effective storage solutions that ar
accessible and don't degrade (like tape), information dispersal is fa
and away the best solution
 
Previously PlanetRudy said:
A couple of comments: The 11 lose 5 scenario means that you'd need to
sustain 5 simultaneous node failures to lose any data. The odds of
that happening are slim - in fact, it creates a "twelve 9" availability
situation -- far more reliable than any other storage solution today.

You are overlooking the time factor. Due to bandwidth and storage
space limitations, re-replication of data can take significant time.

And different from traditional storage media, you have absolutely no
hard numbers on reliability, instead you need to make wild guesses about
your user population behaviour.
The blowup - i.e., the amount of total storage needed relative to the
original data set - is 2.1x the original data set size in the 11 lose 5
scenario. While this might seem high, it's significantly less than the
number of copies of data that companies typically make to ensure that
their data is available when they want to access it. It's generally
accepted that high availability environments create 4-10x the original
data set size.

This solution is not high-availability. You can have lots of
temporary failures from PCs that are not running, laptops
that do not have Internet connectivity, etc..

Also you get the same blowup (more if re-replication is a frequent event)
in network bandwidth usage.
And, we actually are working on a yet-to-be-released version which will
reduce the blowup to ~1.3x.

This sounds to good to be true without some major drawback hidden
in it.
Finally, with people and companies wanting to keep data around for a
long time on secure, cost-effective storage solutions that are
accessible and don't degrade (like tape), information dispersal is far
and away the best solution.

I strongly disagree. True, this is usually quoted as advantage of
this type of system. But it is bogus: Instead of needing to
monitor tape or MOD degradation (which is well understood and has
basically no risks), the user now needs to monitor the state
of you network. If your users leave in significant numbers, the
remaining users will be in trouble. This is an entierly unquantifyable
risk compared to the well understood risk of tape, MOD or other
traditional archival media solutions.

As I asid, intuitively this is intriguing. But if you look at the
numbers it turns out that traditional in-house or external archival
storage has risks that are well understood and quantifiable. This
system is a wild card with not well understood risks and it can have
risks that are entriely non-obvious. You might get lucky or you might
not. And then, traditional archival storage is not that expensive. If
done in-house it also does not have the bandwidth problem.

Personally I think this is nice to play around with, but only a fool
would depend on it. Also it is unusable for larger amounts of data. If
you store larger amounts of data, you get completely unrealistic
numbers of users that need to participate in this long-term.

Arno
 
This solution is not high-availability. You can have lots of
temporary failures from PCs that are not running, laptops
that do not have Internet connectivity, etc..

Arno, your comments seem to be assuming that we are designing
Dispersed Storage to be hosted on devices like laptops that come and
go. This is not the focus of the initial release. Laptops and home
PCs can be clients for a Dispersed Storage grid, but they are not the
focus type of server.

The initial focus for servers for Cleversafe Dispersed Storage grids
are hosted servers whose availability would typically be around
99.9%. Hosting a Dispersed Storage grid on this class of servers
results in extremely available and reliable storage.
This sounds to good to be true without some major drawback hidden
in it.

In order to realize a blowup of 1.3 (i.e. a storage overhead of 30%),
we are using methods like Reed-Solomon coding to get that level of
overhead at extremely high levels of reliability. These methods have
been around for decades and are widely used in communications.
Personally I think this is nice to play around with, but only a fool
would depend on it. Also it is unusable for larger amounts of data. If
you store larger amounts of data, you get completely unrealistic
numbers of users that need to participate in this long-term.
Dispersed Storage was NOT is not being designed to be hosted on a
federation of low availability devices, like laptops and home PCs.
Dispersed Storage IS designed for a hosting model like that of the
Internet. The Internet uses an open protocol -- TCP/IP, but is
typically provided as a commercial service by ISPs who use highly
available devices -- hosted routers -- to provide an inter-networking
service. Some larger organizations also host their own routers to
provide internal networking services.

Cleversafe Dispersed Storage is designed for a model like the Internet
where a variety of companies like ISPs and hosting companies will
offer storage as a service using highly available devices -- storage
servers. In addition, some larger organizations will also use
Dispersed Storage to host their own storage services. You can also
use the open Dispersed Storage protocol to create a non-commercial
Dispersed Storage grid. When building a Dispersed Storage grid, we'd
recommend you use servers to build that grid, just like you'd want to
use highly available routers if you were building your own Internet.

Perhaps one day, you'll see mesh communications networks and mesh
storage networks built on very low reliability devices like laptops
that come and go a lot, but that is not the initial focus of
Cleversafe Dispersed Storage.


Regards,
Chris Gladwin
 
Arno, your comments seem to be assuming that we are designing
Dispersed Storage to be hosted on devices like laptops that come and
go. This is not the focus of the initial release. Laptops and home
PCs can be clients for a Dispersed Storage grid, but they are not the
focus type of server.
The initial focus for servers for Cleversafe Dispersed Storage grids
are hosted servers whose availability would typically be around
99.9%. Hosting a Dispersed Storage grid on this class of servers
results in extremely available and reliable storage.

Aha, so initially you assure reliability by paying money for it.
When that runs out you then move to the chancy laptop and so
model?
In order to realize a blowup of 1.3 (i.e. a storage overhead of 30%),
we are using methods like Reed-Solomon coding to get that level of
overhead at extremely high levels of reliability. These methods have
been around for decades and are widely used in communications.

But not for the types of failure you need to be able to deal with
in this application. I think you are being far too optimistic here.
Dispersed Storage was NOT is not being designed to be hosted on a
federation of low availability devices, like laptops and home PCs.
Dispersed Storage IS designed for a hosting model like that of the
Internet. The Internet uses an open protocol -- TCP/IP, but is
typically provided as a commercial service by ISPs who use highly
available devices -- hosted routers -- to provide an inter-networking
service. Some larger organizations also host their own routers to
provide internal networking services.

That is BS. The Internet has a very low storage capacity, basically
the packets that are in flight. That is because it is an interconnect
system. The Internet consequentially has no hosting model at all,
although you can read things like this in nonsensical press articles
these days.
Cleversafe Dispersed Storage is designed for a model like the Internet
where a variety of companies like ISPs and hosting companies will
offer storage as a service using highly available devices -- storage
servers. In addition, some larger organizations will also use
Dispersed Storage to host their own storage services. You can also
use the open Dispersed Storage protocol to create a non-commercial
Dispersed Storage grid. When building a Dispersed Storage grid, we'd
recommend you use servers to build that grid, just like you'd want to
use highly available routers if you were building your own Internet.
Perhaps one day, you'll see mesh communications networks and mesh
storage networks built on very low reliability devices like laptops
that come and go a lot, but that is not the initial focus of
Cleversafe Dispersed Storage.

Fine. So if the servers are static and all are high-reliability,
what is new with this approach? You could just use ECC-file
and distribute the chunks over the different servers. Or even
simpler: Do RAID6 with 7 chunks, but each on its own,
high-reliability server, and you get an overhead of 1.4 and
a fault-tolerance of 2 on 7. There, all allready solved.

So tell me, why are you talking to private and small customers,
like you do here? And what is so new or exciting about your
system?

Arno
 
Back
Top