Understanding the effects of power failure on flash-based SSDs

  • Thread starter Thread starter Tom Del Rosso
  • Start date Start date
Yousuf said:
Interesting findings.

I also thought it was interesting that the 3 authors at the bottom of the
article are barely out of college. EE Times byline pictures used to be old
curmudgeon engineers.
 
Indeed. But not very surprising. One of the reasons I tell
people not to trust SSDs too much. The larger sectors come at
a price.

I wouldn't go that far, some of these problems could easily affect HDD's
too. When the power fails, unsaved content is always in an unknown state
no matter what the medium, whether it be the newest-fangled SSD's or the
oldest-fashioned cassette tape drives. Much of it is still fixable
through standard filesystem error recovery tools, such as chkdsk. SSD's
are mostly used as boot disks these days which are relatively static:
90% of the disk is never written to after the first installation, and
the 10% that might be written to are usually things like log files or
application data. Nothing that might severely prevent the boot up of the
OS.

In a few years when SSD's are much larger and taking over some of the
large scale roles that are currently done on HDD's, such as Oracle RDBMS
databases and stuff, that's when we should worry about the reliability
of data on SSD's during power failures. I think once SSD's hit 1TB in
size, that's when some enterprises are going to seriously start looking
at them to store full RDBMS data. The performance advantages are hard to
ignore. These are the only devices capable of fully saturating the
bandwidth of SATA and/or SAS links by themselves, and beyond. They even
have SSD's that are directly attached to the PCI-E ports, because
standard disk connector ports aren't fast enough.

Yousuf Khan
 
I wouldn't go that far, some of these problems could easily affect HDD's
too.

You should not trust HDDs too much either ;-)
But this problem is an SSD speciality today.
When the power fails, unsaved content is always in an unknown state
no matter what the medium, whether it be the newest-fangled SSD's or the
oldest-fashioned cassette tape drives. Much of it is still fixable
through standard filesystem error recovery tools, such as chkdsk. SSD's
are mostly used as boot disks these days which are relatively static:
90% of the disk is never written to after the first installation, and
the 10% that might be written to are usually things like log files or
application data. Nothing that might severely prevent the boot up of the
OS.

Hmm. Come to think of it, while I have 4 SSDs, none of them is
used in that role. Do you have any numbers on this? Or is it an
intuition?
In a few years when SSD's are much larger and taking over some of the
large scale roles that are currently done on HDD's, such as Oracle RDBMS
databases and stuff, that's when we should worry about the reliability
of data on SSD's during power failures. I think once SSD's hit 1TB in
size, that's when some enterprises are going to seriously start looking
at them to store full RDBMS data. The performance advantages are hard to
ignore. These are the only devices capable of fully saturating the
bandwidth of SATA and/or SAS links by themselves, and beyond. They even
have SSD's that are directly attached to the PCI-E ports, because
standard disk connector ports aren't fast enough.

Speed is not everything.

Arno
 
Arno said:
Hmm. Come to think of it, while I have 4 SSDs, none of them is
used in that role. Do you have any numbers on this? Or is it an
intuition?

One of the main points of the article was that data written earlier can be
ruined if power fails when another page is written into the same multi-level
cells. So there goes your OS.

I don't understand why the 2 bits are in different pages. Doesn't the erase
phase have to erase the whole cell? Then re-writing both bits is necessary,
and that would be more efficient if they were in the same page.

I also don't understand why there is no non-volatile buffer (which could
also act as a cache). Most drives have no more than 1 GB modified in a day,
so 1GB of RAM and a double-layer capacitor could write that data to the
flash when power is lost. The erase phase takes longer than write IIRC, so
the block can be erased when power is on, and the data committed later by a
flush command or when the power goes off.
 
One of the main points of the article was that data written earlier can be
ruined if power fails when another page is written into the same multi-level
cells. So there goes your OS.
I don't understand why the 2 bits are in different pages.
Doesn't the erase phase have to erase the whole cell? Then
re-writing both bits is necessary, and that would be more
efficient if they were in the same page.

I agree with you. This design sucks. But you can get similar
effects from the very large sectors even with SLC. My guess
is a mixture of faulty design, incompetence and a
"consumer grade" mindset, i.e. who cares about good
data integrity in these devices anyways.

It could have something to do with easier data correction
(by spreading the error to two pages), but modern ECC
codes should compensate for that.

What surprises me though, is the nergy aspect. That would
clearly favour a desiogn where bits in a cell stay in the same
sector.
I also don't understand why there is no non-volatile buffer (which could
also act as a cache).

Too expensive, complicates the design. I do understand
that one. Side note: On Enterprise-grase SLC drives,
they have supercaps to give the drive a few seconds for
cleanup after the power is gone.
Most drives have no more than 1 GB modified in a day,
so 1GB of RAM and a double-layer capacitor could write that data to the
flash when power is lost.

It is even better. Only what is in the write-buffer and has
been declared to be flushed to disk to the OS needs to
get to flash. That is quite enough. The supercaps
are expensive though and need additional circuitry.
The erase phase takes longer than write IIRC, so
the block can be erased when power is on, and the
data committed later by a
flush command or when the power goes off.

Not really. This fails when you overwrite data. SSDs
keep a pool of erased pages, but that can exhaust
quickly.

So, to sum up, this is all implemented and works reliably
in enterprise-grade SSDs. Consumer-grade trash has
gotten better, but since consumers are not willing to pay
for good engineering, consumer storage will allways
be unreliable. Personally, I work around that with
RAID and backup.

Arno
 
Yousuf Khan said:
Arno wrote
I wouldn't go that far, some of these problems could easily affect HDD's
too.
Nope.

When the power fails,

With a hard drive, the worst that can happen is that a badly
designed drive can see the sector being written at the time
the power fails end up corrupted and has to be reallocated.

And even that doesn't happen with a well designed hard drive
which can see the power failing and has enough time to write
that particular sector and not start writing anymore before the
cap on the drive runs down enough so that it cant write anymore.

The drive itself will still be spinning fine in the tiny amount of
time needed to complete the writing of that particular sector.
unsaved content is always in an unknown state no matter what the medium,
whether it be the newest-fangled SSD's or the oldest-fashioned cassette
tape drives.

Yes, but the worst that can happen with a well designed
drive is that it doesn't get to write what as pending to be
written at the time that the power failed.
Much of it is still fixable through standard filesystem error recovery
tools, such as chkdsk.

Not necessarily with SSDs as that article proves.
SSD's are mostly used as boot disks these days which are relatively
static:

They are in fact extensively used as the ONLY drive in the system now.
90% of the disk is never written to after the first installation, and the
10% that might be written to are usually things like log files or
application data. Nothing that might severely prevent the boot up of the
OS.

That's clearly not the situation Arno is discussing trust wise.
In a few years when SSD's are much larger and taking over some of the
large scale roles that are currently done on HDD's, such as Oracle RDBMS
databases and stuff, that's when we should worry about the reliability of
data on SSD's during power failures.

Nope, we should also worry when it's the only
drive in the system and that's true right now.

It isnt even possible to have more than one drive in
most laptops and hordes have nothing but a laptop now.
I think once SSD's hit 1TB in size, that's when some enterprises are going
to seriously start looking at them to store full RDBMS data.

I doubt it, essentially because a well designed
database doesn't need super fast disk access much.
The performance advantages are hard to ignore.

But easily obtained much more safely in other ways with databases.

And those are academic anyway because they will normally
be fully backed up anyway so the vulnerability of SSDs to
power failure isnt relevant to them, and few of the high
performance databases wouldn't be on a UPS anyway.
These are the only devices capable of fully saturating the bandwidth of
SATA and/or SAS links by themselves, and beyond.

And there are very few apps that need to do anything like that.
Really just very high speed data capture in very exotic situations.

And even then, power failure is trivially avoidable with a UPS anyway.
They even have SSD's that are directly attached to the PCI-E ports,
because standard disk connector ports aren't fast enough.

But if you are doing that, you can trivially add a UPS
to completely avoid the power failure situation.
 
I wouldn't go that far, some of these problems could easily affect HDD's
too. When the power fails, unsaved content is always in an unknown state
no matter what the medium, whether it be the newest-fangled SSD's or the
oldest-fashioned cassette tape drives. Much of it is still fixable
through standard filesystem error recovery tools, such as chkdsk.

The most common reason for failure (90%) in flash drives (and probably
SSDs) appears to be translator corruption (damaged lookup tables),
especially if the power fails while the translator is being updated.
So it's more than just unsaved content that is at risk.

What are the Flash drives' typical failures [Public Forum]:
http://www.salvationdata.com/forum/topic1873.html

Unlike HDDs, SSDs and flash drives perform wear levelling, so I would
think this makes them especially vulnerable to power interruptions.

- Franc Zabkar
 
Hmm. Come to think of it, while I have 4 SSDs, none of them is
used in that role. Do you have any numbers on this? Or is it an
intuition?

You're kidding, you're not using your SSD's as boot drives? That was the
only reason I could think of for using an SSD.

Anyways, I do have some anecdotal numbers, so it's not just intuition,
more like a guestimate. My own boot drive has about 120GB of data in it,
but it's made up of over 1.5 million files! Each of the rest of my 5
internal data drives and 2 remaining externals hold anywhere from over
400GB to over 1.3TB, but none of those drives hold more than 10,000
files each. So you can see that the usage pattern is that the boot drive
has a lot of little files. Much of it is OS files which might be read
quite often but not written to much, while much of it also automatically
generated application data that would be written to quite often and
without alerting the user at all.
Speed is not everything.

Some businesses might require it. These are the ones paying for large
raid arrays right now. SSD's would be faster than the fastest RAIDs.

Yousuf Khan
 
One of the main points of the article was that data written earlier can be
ruined if power fails when another page is written into the same multi-level
cells. So there goes your OS.

Well, hard to say, even if the two layers are on different pages, they
may still be adjacent pages, and they may still be part of the same
erase block. So even though damage can be done to a previously written
page, it may have been part of the same group of data; and since the
loss is to one piece of data a bit more damage to the same data stream
may not matter much, since it's already damaged goods.
I also don't understand why there is no non-volatile buffer (which could
also act as a cache). Most drives have no more than 1 GB modified in a day,
so 1GB of RAM and a double-layer capacitor could write that data to the
flash when power is lost. The erase phase takes longer than write IIRC, so
the block can be erased when power is on, and the data committed later by a
flush command or when the power goes off.

A non-volatile buffer? You mean like flash memory? Why would you want
flash memory backing up flash memory?

Yousuf Khan
 
Yousuf said:
Well, hard to say, even if the two layers are on different pages, they
may still be adjacent pages, and they may still be part of the same
erase block. So even though damage can be done to a previously written
page, it may have been part of the same group of data; and since the
loss is to one piece of data a bit more damage to the same data stream
may not matter much, since it's already damaged goods.


A non-volatile buffer? You mean like flash memory? Why would you want
flash memory backing up flash memory?

I can see the words, "RAM and a double-layer capacitor" right there in the
quote. The term non-volatile has been used for decades in reference to more
than just EEPROM and Flash. In this context NV doesn't mean forever, but
until the charge runs out.

Just because some terminology is popularized does not mean other terminology
should go out of use.
 
Arno said:
I agree with you. This design sucks. But you can get similar
effects from the very large sectors even with SLC. My guess
is a mixture of faulty design, incompetence and a
"consumer grade" mindset, i.e. who cares about good
data integrity in these devices anyways.

It could have something to do with easier data correction
(by spreading the error to two pages), but modern ECC
codes should compensate for that.

What surprises me though, is the nergy aspect. That would
clearly favour a desiogn where bits in a cell stay in the same
sector.


Too expensive, complicates the design. I do understand
that one. Side note: On Enterprise-grase SLC drives,
they have supercaps to give the drive a few seconds for
cleanup after the power is gone.

Well that's what I mean. If it's available as an option then that's good
enough. I would choose it.

It is even better. Only what is in the write-buffer and has
been declared to be flushed to disk to the OS needs to
get to flash. That is quite enough. The supercaps
are expensive though and need additional circuitry.

30 years ago my VCR had a 1 Farad capacitor to retain the recording schedule
in SRAM. The cap was half the size of a pack of cigarettes back then. And
that was before any of these chips were available:
http://www.maxim-ic.com/products/supervisors/battery_backup/

Not really. This fails when you overwrite data. SSDs
keep a pool of erased pages, but that can exhaust
quickly.

If you overwrite the same sector twice in a day, with my proposed scheme you
would only have to erase it once. The DRAM would hold both writes and flush
the data to flash later. It can keep track of the erased pages as part of
the cache tags that are needed anyway.

So, to sum up, this is all implemented and works reliably
in enterprise-grade SSDs. Consumer-grade trash has
gotten better, but since consumers are not willing to pay
for good engineering, consumer storage will allways
be unreliable. Personally, I work around that with
RAID and backup.

Who makes "enterprise-grade SSD's"? I have looked for this feature.

People make no sense. They will buy more expensive things sometimes, but
not when it really makes a difference.
 
You're kidding, you're not using your SSD's as boot drives? That was the
only reason I could think of for using an SSD.

Why would I use them to speed up something I do so rarely?
Booting is not a significant time-factor. Applications,
data access, all that benefits often during computer use from
an SSD, but booting?
Anyways, I do have some anecdotal numbers, so it's not just intuition,
more like a guestimate. My own boot drive has about 120GB of data in it,
but it's made up of over 1.5 million files! Each of the rest of my 5
internal data drives and 2 remaining externals hold anywhere from over
400GB to over 1.3TB, but none of those drives hold more than 10,000
files each. So you can see that the usage pattern is that the boot drive
has a lot of little files. Much of it is OS files which might be read
quite often but not written to much, while much of it also automatically
generated application data that would be written to quite often and
without alerting the user at all.

Maybe I should say that I do not put my applications and data
on the boot drive for Windows, and that for Linux the only
thing that comes from the boot drive is the kernel.
Some businesses might require it. These are the ones paying for large
raid arrays right now. SSD's would be faster than the fastest RAIDs.

Not generally true. And the RAID has other benefits. SSDs are
nice for a lot of small read accesses. And for home-users,
as RAID with comparable sped may be pretty loud. But SSDs are
not the best solution even in a lot of scenarios where speed
matters.

And I quite disagree on reliability. SSDs are worse in some regards
and that is no surprise. I have, for example, one SSD in a RAID1
with two HDDs and so far the HDDs are far more reliable. The
SSD dropps out of the RAID every few months. Easy to fix in
my scenario, but if it were only the SSD storing the data,
that would be a real problem. This is very likely due to lower
controller maturitu on the SSD than on the HDDs.

Arno
 
The most common reason for failure (90%) in flash drives (and probably
SSDs) appears to be translator corruption (damaged lookup tables),
especially if the power fails while the translator is being updated.
So it's more than just unsaved content that is at risk.
What are the Flash drives' typical failures [Public Forum]:
http://www.salvationdata.com/forum/topic1873.html
Unlike HDDs, SSDs and flash drives perform wear levelling, so I would
think this makes them especially vulnerable to power interruptions.

I agree. And in addition the engineering on other components
is not as mature as for HDDs. The only SSD I have with critical
data is therefore in a RAID1 with two HDDs. (Reads comne from the
SSD, writes go to all three. Basically as fast as the bare SSD
for my application.)

My guess would be that at this time, total loss of data
without warning is more likely for a consumer-grade SSD than
for a consumer grade HDD.

That is not to say SSDs are trash, it is just important to
understand their failure-modes.

Arno
 
The most common reason for failure (90%) in flash drives (and probably
SSDs) appears to be translator corruption (damaged lookup tables),
especially if the power fails while the translator is being updated.
So it's more than just unsaved content that is at risk.
What are the Flash drives' typical failures [Public Forum]:
http://www.salvationdata.com/forum/topic1873.html
Unlike HDDs, SSDs and flash drives perform wear levelling, so I would
think this makes them especially vulnerable to power interruptions.

I agree. And in addition the engineering on other components
is not as mature as for HDDs. The only SSD I have with critical
data is therefore in a RAID1 with two HDDs. (Reads comne from the
SSD, writes go to all three. Basically as fast as the bare SSD
for my application.)

My guess would be that at this time, total loss of data
without warning is more likely for a consumer-grade SSD than
for a consumer grade HDD.

That is not to say SSDs are trash, it is just important to
understand their failure-modes.

Arno

Important data doesn't belong solely on portable devices, and the
storage where it DOES belong should be UPS protected. I think the
reliability of SSDs is a non-issue.
 
cjt said:
On 15/05/2012 7:07 PM, Arno wrote:
On 13/05/2012 4:36 PM, Tom Del Rosso wrote:
http://www.eetimes.com/design/memor...-effects-of-power-failure-on-flash-based-SSDs

Interesting findings.

Indeed. But not very surprising. One of the reasons I tell
people not to trust SSDs too much. The larger sectors come at
a price.

I wouldn't go that far, some of these problems could easily affect HDD's
too. When the power fails, unsaved content is always in an unknown state
no matter what the medium, whether it be the newest-fangled SSD's or the
oldest-fashioned cassette tape drives. Much of it is still fixable
through standard filesystem error recovery tools, such as chkdsk.
The most common reason for failure (90%) in flash drives (and probably
SSDs) appears to be translator corruption (damaged lookup tables),
especially if the power fails while the translator is being updated.
So it's more than just unsaved content that is at risk.
What are the Flash drives' typical failures [Public Forum]:
http://www.salvationdata.com/forum/topic1873.html
Unlike HDDs, SSDs and flash drives perform wear levelling, so I would
think this makes them especially vulnerable to power interruptions.

I agree. And in addition the engineering on other components
is not as mature as for HDDs. The only SSD I have with critical
data is therefore in a RAID1 with two HDDs. (Reads comne from the
SSD, writes go to all three. Basically as fast as the bare SSD
for my application.)

My guess would be that at this time, total loss of data
without warning is more likely for a consumer-grade SSD than
for a consumer grade HDD.

That is not to say SSDs are trash, it is just important to
understand their failure-modes.

Arno
Important data doesn't belong solely on portable devices, and the
storage where it DOES belong should be UPS protected. I think the
reliability of SSDs is a non-issue.

What are you talking about? Nobody is talking about portable
devices. UPS is completely unnecessary if your PSU reset-line
works correctly and your filesystems or databases have been
implemented correctly, i.e. with power-failure in mind.

Arno
 
I can see the words, "RAM and a double-layer capacitor" right there in the
quote. The term non-volatile has been used for decades in reference to more
than just EEPROM and Flash. In this context NV doesn't mean forever, but
until the charge runs out.

Just because some terminology is popularized does not mean other terminology
should go out of use.

No one is disputing that capacitor-backed RAM would also be considered
NV, but flash is also NV. Why make things extra convoluted by having one
type of NV memory backup another type? Isn't it better just have that
capacitor keep up enough charge so that the flash memory itself so that
it can complete?

There is another type of NV memory called MRAM which is apparently the
fastest writer of all, nearly as fast as SRAM for reading and writing
operations, but it has some problems in being miniaturized, so far.

Yousuf Khan
 
Not generally true. And the RAID has other benefits. SSDs are
nice for a lot of small read accesses. And for home-users,
as RAID with comparable sped may be pretty loud. But SSDs are
not the best solution even in a lot of scenarios where speed
matters.

And I quite disagree on reliability. SSDs are worse in some regards
and that is no surprise. I have, for example, one SSD in a RAID1
with two HDDs and so far the HDDs are far more reliable. The
SSD dropps out of the RAID every few months. Easy to fix in
my scenario, but if it were only the SSD storing the data,
that would be a real problem. This is very likely due to lower
controller maturitu on the SSD than on the HDDs.

It might be due to the fact that the other members of the RAID set are
on such a different plain of performance than the SSD. The SSD might be
dropping out because the RAID software/firmware finds it finishes much
faster than the HDD, and that would look like an error occurred on the
drive, according to the RAID software.

Yousuf Khan
 
Back
Top