Yousuf said:Interesting findings.
Interesting findings.
Indeed. But not very surprising. One of the reasons I tell
people not to trust SSDs too much. The larger sectors come at
a price.
I wouldn't go that far, some of these problems could easily affect HDD's
too.
When the power fails, unsaved content is always in an unknown state
no matter what the medium, whether it be the newest-fangled SSD's or the
oldest-fashioned cassette tape drives. Much of it is still fixable
through standard filesystem error recovery tools, such as chkdsk. SSD's
are mostly used as boot disks these days which are relatively static:
90% of the disk is never written to after the first installation, and
the 10% that might be written to are usually things like log files or
application data. Nothing that might severely prevent the boot up of the
OS.
In a few years when SSD's are much larger and taking over some of the
large scale roles that are currently done on HDD's, such as Oracle RDBMS
databases and stuff, that's when we should worry about the reliability
of data on SSD's during power failures. I think once SSD's hit 1TB in
size, that's when some enterprises are going to seriously start looking
at them to store full RDBMS data. The performance advantages are hard to
ignore. These are the only devices capable of fully saturating the
bandwidth of SATA and/or SAS links by themselves, and beyond. They even
have SSD's that are directly attached to the PCI-E ports, because
standard disk connector ports aren't fast enough.
Arno said:Hmm. Come to think of it, while I have 4 SSDs, none of them is
used in that role. Do you have any numbers on this? Or is it an
intuition?
One of the main points of the article was that data written earlier can be
ruined if power fails when another page is written into the same multi-level
cells. So there goes your OS.
I don't understand why the 2 bits are in different pages.
Doesn't the erase phase have to erase the whole cell? Then
re-writing both bits is necessary, and that would be more
efficient if they were in the same page.
I also don't understand why there is no non-volatile buffer (which could
also act as a cache).
Most drives have no more than 1 GB modified in a day,
so 1GB of RAM and a double-layer capacitor could write that data to the
flash when power is lost.
The erase phase takes longer than write IIRC, so
the block can be erased when power is on, and the
data committed later by a
flush command or when the power goes off.
Yousuf Khan said:Arno wrote
I wouldn't go that far, some of these problems could easily affect HDD's
too.
Nope.
When the power fails,
unsaved content is always in an unknown state no matter what the medium,
whether it be the newest-fangled SSD's or the oldest-fashioned cassette
tape drives.
Much of it is still fixable through standard filesystem error recovery
tools, such as chkdsk.
SSD's are mostly used as boot disks these days which are relatively
static:
90% of the disk is never written to after the first installation, and the
10% that might be written to are usually things like log files or
application data. Nothing that might severely prevent the boot up of the
OS.
In a few years when SSD's are much larger and taking over some of the
large scale roles that are currently done on HDD's, such as Oracle RDBMS
databases and stuff, that's when we should worry about the reliability of
data on SSD's during power failures.
I think once SSD's hit 1TB in size, that's when some enterprises are going
to seriously start looking at them to store full RDBMS data.
The performance advantages are hard to ignore.
These are the only devices capable of fully saturating the bandwidth of
SATA and/or SAS links by themselves, and beyond.
They even have SSD's that are directly attached to the PCI-E ports,
because standard disk connector ports aren't fast enough.
I wouldn't go that far, some of these problems could easily affect HDD's
too. When the power fails, unsaved content is always in an unknown state
no matter what the medium, whether it be the newest-fangled SSD's or the
oldest-fashioned cassette tape drives. Much of it is still fixable
through standard filesystem error recovery tools, such as chkdsk.
Hmm. Come to think of it, while I have 4 SSDs, none of them is
used in that role. Do you have any numbers on this? Or is it an
intuition?
Speed is not everything.
One of the main points of the article was that data written earlier can be
ruined if power fails when another page is written into the same multi-level
cells. So there goes your OS.
I also don't understand why there is no non-volatile buffer (which could
also act as a cache). Most drives have no more than 1 GB modified in a day,
so 1GB of RAM and a double-layer capacitor could write that data to the
flash when power is lost. The erase phase takes longer than write IIRC, so
the block can be erased when power is on, and the data committed later by a
flush command or when the power goes off.
Yousuf said:Well, hard to say, even if the two layers are on different pages, they
may still be adjacent pages, and they may still be part of the same
erase block. So even though damage can be done to a previously written
page, it may have been part of the same group of data; and since the
loss is to one piece of data a bit more damage to the same data stream
may not matter much, since it's already damaged goods.
A non-volatile buffer? You mean like flash memory? Why would you want
flash memory backing up flash memory?
Arno said:I agree with you. This design sucks. But you can get similar
effects from the very large sectors even with SLC. My guess
is a mixture of faulty design, incompetence and a
"consumer grade" mindset, i.e. who cares about good
data integrity in these devices anyways.
It could have something to do with easier data correction
(by spreading the error to two pages), but modern ECC
codes should compensate for that.
What surprises me though, is the nergy aspect. That would
clearly favour a desiogn where bits in a cell stay in the same
sector.
Too expensive, complicates the design. I do understand
that one. Side note: On Enterprise-grase SLC drives,
they have supercaps to give the drive a few seconds for
cleanup after the power is gone.
It is even better. Only what is in the write-buffer and has
been declared to be flushed to disk to the OS needs to
get to flash. That is quite enough. The supercaps
are expensive though and need additional circuitry.
Not really. This fails when you overwrite data. SSDs
keep a pool of erased pages, but that can exhaust
quickly.
So, to sum up, this is all implemented and works reliably
in enterprise-grade SSDs. Consumer-grade trash has
gotten better, but since consumers are not willing to pay
for good engineering, consumer storage will allways
be unreliable. Personally, I work around that with
RAID and backup.
You're kidding, you're not using your SSD's as boot drives? That was the
only reason I could think of for using an SSD.
Anyways, I do have some anecdotal numbers, so it's not just intuition,
more like a guestimate. My own boot drive has about 120GB of data in it,
but it's made up of over 1.5 million files! Each of the rest of my 5
internal data drives and 2 remaining externals hold anywhere from over
400GB to over 1.3TB, but none of those drives hold more than 10,000
files each. So you can see that the usage pattern is that the boot drive
has a lot of little files. Much of it is OS files which might be read
quite often but not written to much, while much of it also automatically
generated application data that would be written to quite often and
without alerting the user at all.
Some businesses might require it. These are the ones paying for large
raid arrays right now. SSD's would be faster than the fastest RAIDs.
The most common reason for failure (90%) in flash drives (and probably
SSDs) appears to be translator corruption (damaged lookup tables),
especially if the power fails while the translator is being updated.
So it's more than just unsaved content that is at risk.
What are the Flash drives' typical failures [Public Forum]:
http://www.salvationdata.com/forum/topic1873.html
Unlike HDDs, SSDs and flash drives perform wear levelling, so I would
think this makes them especially vulnerable to power interruptions.
The most common reason for failure (90%) in flash drives (and probably
SSDs) appears to be translator corruption (damaged lookup tables),
especially if the power fails while the translator is being updated.
So it's more than just unsaved content that is at risk.What are the Flash drives' typical failures [Public Forum]:
http://www.salvationdata.com/forum/topic1873.htmlUnlike HDDs, SSDs and flash drives perform wear levelling, so I would
think this makes them especially vulnerable to power interruptions.
I agree. And in addition the engineering on other components
is not as mature as for HDDs. The only SSD I have with critical
data is therefore in a RAID1 with two HDDs. (Reads comne from the
SSD, writes go to all three. Basically as fast as the bare SSD
for my application.)
My guess would be that at this time, total loss of data
without warning is more likely for a consumer-grade SSD than
for a consumer grade HDD.
That is not to say SSDs are trash, it is just important to
understand their failure-modes.
Arno
cjt said:On 15/05/2012 7:07 PM, Arno wrote:
On 13/05/2012 4:36 PM, Tom Del Rosso wrote:
http://www.eetimes.com/design/memor...-effects-of-power-failure-on-flash-based-SSDs
Interesting findings.
Indeed. But not very surprising. One of the reasons I tell
people not to trust SSDs too much. The larger sectors come at
a price.
I wouldn't go that far, some of these problems could easily affect HDD's
too. When the power fails, unsaved content is always in an unknown state
no matter what the medium, whether it be the newest-fangled SSD's or the
oldest-fashioned cassette tape drives. Much of it is still fixable
through standard filesystem error recovery tools, such as chkdsk.The most common reason for failure (90%) in flash drives (and probably
SSDs) appears to be translator corruption (damaged lookup tables),
especially if the power fails while the translator is being updated.
So it's more than just unsaved content that is at risk.What are the Flash drives' typical failures [Public Forum]:
http://www.salvationdata.com/forum/topic1873.htmlUnlike HDDs, SSDs and flash drives perform wear levelling, so I would
think this makes them especially vulnerable to power interruptions.
I agree. And in addition the engineering on other components
is not as mature as for HDDs. The only SSD I have with critical
data is therefore in a RAID1 with two HDDs. (Reads comne from the
SSD, writes go to all three. Basically as fast as the bare SSD
for my application.)
My guess would be that at this time, total loss of data
without warning is more likely for a consumer-grade SSD than
for a consumer grade HDD.
That is not to say SSDs are trash, it is just important to
understand their failure-modes.
Arno
Important data doesn't belong solely on portable devices, and the
storage where it DOES belong should be UPS protected. I think the
reliability of SSDs is a non-issue.
I can see the words, "RAM and a double-layer capacitor" right there in the
quote. The term non-volatile has been used for decades in reference to more
than just EEPROM and Flash. In this context NV doesn't mean forever, but
until the charge runs out.
Just because some terminology is popularized does not mean other terminology
should go out of use.
Not generally true. And the RAID has other benefits. SSDs are
nice for a lot of small read accesses. And for home-users,
as RAID with comparable sped may be pretty loud. But SSDs are
not the best solution even in a lot of scenarios where speed
matters.
And I quite disagree on reliability. SSDs are worse in some regards
and that is no surprise. I have, for example, one SSD in a RAID1
with two HDDs and so far the HDDs are far more reliable. The
SSD dropps out of the RAID every few months. Easy to fix in
my scenario, but if it were only the SSD storing the data,
that would be a real problem. This is very likely due to lower
controller maturitu on the SSD than on the HDDs.