SDD bad sectors

  • Thread starter Thread starter John Doe
  • Start date Start date
J

John Doe

I think the root of my recent problems has been intermittent bad
sectors on my Solid State Disk (SDD). I have never seen so many
different types of errors, pointing in different directions.

I get the impression that Windows XP SP3 is unable to handle SDD
bad sectors. Do not know where the fault would lie, just seems
that way.
 
John said:
I think the root of my recent problems has been intermittent bad
sectors on my Solid State Disk (SDD). I have never seen so many
different types of errors, pointing in different directions.

I get the impression that Windows XP SP3 is unable to handle SDD
bad sectors. Do not know where the fault would lie, just seems
that way.

The data is protected by a checksum on each sector. Error correction
is possible by the SSD drive. Which means, it reads a sector,
and corrects it if necessary. Error correction works, until the
polynomial has no more correcting power left, and then either you
get bad data delivered as good data, or you get bad data delivered
along with a CRC error for the sector. But the thing is, if the error
rate was high, you'd get 999 reports of CRC errors at that level,
for every 1 sector that "leaked through" without being caught.

If your drive was full of errors, you'd be getting CRC errors reported
by the drive, in cases where there were so many errors it could not correct.

The SATA cable, is protected by a checksum as well. So if there is a
transmission on the SATA cable, I believe it can retry, if the error
is detected. And the same thing happens, with respect to the error
check. Most of the time, errors will be caught. Some of the time,
a corrupted SATA packet leaks through without being caught and
retried. Thus, a bad SATA cable could be an issue.

And if problems are detected with the flash, a block can be substituted
by a spare 128K block. When you run out of spares, then that location
should always report an error when you read it.

Based on the above, I don't feel your drive problems should be
"invisible".

What I don't know is, how checksum errors on the SATA cable are
handled. Are they logged somewhere ? Can I check to see how many
errors I've got of that type, since boot ? I don't know the
answer to that.

CRC is mentioned on page 15 of this SATA standards document. But
this doesn't address the chipset end, and what the driver or
OS do with such information. That's the "weak link".

http://web.archive.org/web/20030213....org/collateral/zipdownloads/serialata10a.ZIP

Looking in an ICH9 datasheet, there is a bit in the SATA interface for it.

"21 CRC Error (C): Indicates that one or more CRC errors
occurred with the Link Layer."

And I also don't know, what you're supposed to use to test for
that. Same problem - no observability. What does the OS do with
that info ?

You can see on this page, Linux has some notion of reporting
the errors. If you can figure out where this log would be
found.

https://ata.wiki.kernel.org/index.php/Libata_error_messages

SError: { BadCRC }

HTH,
Paul
 
John said:
I think the root of my recent problems has been intermittent bad
sectors on my Solid State Disk (SDD). I have never seen so many
different types of errors, pointing in different directions.

I don't remember you mentioning it, but have you run memtest and
prime95 lately? I have had memory go south on me. I have also had
memory and CPUs that required a tad more voltage for stability. Can
we even expect a mass produced motherboard to accurately provide
voltages to the hundredth of a volt?!
I get the impression that Windows XP SP3 is unable to handle SDD
bad sectors. Do not know where the fault would lie, just seems
that way.

Or maybe you really do need that big power supply! ;) I have the
same motherboard and CPU, but it is overclocked to 3.4 GHz, which did
require some voltage tweaking. Did your problems go away when
you swapped-in the Raptor?
 
Fishface said:
I don't remember you mentioning it, but have you run memtest and
prime95 lately?

Yes, early, one pass with memtest. I probably should use prime95
too, or maybe just some SSD tester. Or maybe refresh the SDD
firmware.
I have had memory go south on me. I have also had memory and
CPUs that required a tad more voltage for stability. Can we
even expect a mass produced motherboard to accurately provide
voltages to the hundredth of a volt?!

Unless someone provides a good technical reason to doubt it,
voltages under load while within Windows reported by a utility
will remain IMO better than anything else, and safer than fumbling
around with metal probes inside of a live PC. I use a multimeter,
for other things.
Or maybe you really do need that big power supply! ;)

At least 10,000 watts, usable for Christmas lights too.

That possibility has occurred to me, considering the fact that I
am using a many years old Antech 380 watt True Power 2 (TruePower
II) power supply. But not likely the SDD puts a strain on my
system. And the @#$! strange mixture of errors points to
something other than the power supply.
I have the same motherboard and CPU,

Try enabling (after making a backup).
C2/C2E State Support
C4/C4E State Support

I doubt that BIOS setting was my main problem, but the C4 setting
in fact causes Performance Monitor disk idle time to incorrectly
display. I might doublecheck with only the HDD connected, maybe
after things are known to be fixed.
but it is overclocked to 3.4 GHz, which did require some voltage
tweaking. Did your problems go away when you swapped-in the
Raptor?

I had one freeze when downloading and installing stuff from Steam,
maybe the same as before when downloading and installing stuff
from Windows Updates. But as noted in a prior post, the problems
stopped after removing the SDD drive, before reinstalling it. That
is when I noticed the C4 setting causes an abnormality in
Performance Monitor.

As noted before, I caught a glimpse of CHKDSK reporting bad
sectors on the SDD. After that, CHKDSK showed no errors. But
recently, running CHKDSK from within Windows XP SP3 debugging
mode, CHKDSK clearly showed about 32 kB of bad sectors on the SDD.

Using the SSD can cause things to fall apart immediately. The last
time, Windows would not even boot, same as happened many times
before.

My copies of drive C Windows could have been corrupted by the bad
sectors on the SDD. I am hoping that maybe it will be corrected by
reinstalling stuff.

I am meticulous with cables, so I doubt as Paul posed it is an
SATA cable, but will keep my eyes open and maybe test for that
possibility.

In a heightened state of awareness, I occasionally hear an
extremely short glitch sound through the speakers, like when
listening to text-to-speech or watching and listening to streaming
media, it sounds something like radio scanner stuff. But that
probably is not related to the cause, since errors also occur when
not using sound.

Other observations...

.... the HDD clicks around a lot when in the BIOS, but eventually
stops

.... the HDD gets relatively hot, maybe when doing stuff outside of
Windows like restoring a copy of drive C

.... when freezes occurred, they were followed by about 10 seconds
of HDD activity

That stuff might be typical. Hearing the HDD clicking might just
be strange after using the SDD for a year.

I have already had enough punishment IMO, so I hesitate to
reinstall the SSD for anything except maybe testing it. Using only
the HDD, I would expect things to fall apart again unless the SSD
(or conceivably the cable) was at fault. I might look for an SSD
tester. Or just wait until buying another to replace it. I miss
the quick bootup times of the SDD over the HDD, but restoring a
copy of Windows is faster on the HDD, even when copying to itself.

Thanks.
 
John said:
Yes, early, one pass with memtest. I probably should use prime95
too, or maybe just some SSD tester. Or maybe refresh the SDD
firmware.


Unless someone provides a good technical reason to doubt it,
voltages under load while within Windows reported by a utility
will remain IMO better than anything else, and safer than fumbling
around with metal probes inside of a live PC. I use a multimeter,
for other things.


At least 10,000 watts, usable for Christmas lights too.

That possibility has occurred to me, considering the fact that I
am using a many years old Antech 380 watt True Power 2 (TruePower
II) power supply. But not likely the SDD puts a strain on my
system. And the @#$! strange mixture of errors points to
something other than the power supply.


Try enabling (after making a backup).
C2/C2E State Support
C4/C4E State Support

I doubt that BIOS setting was my main problem, but the C4 setting
in fact causes Performance Monitor disk idle time to incorrectly
display. I might doublecheck with only the HDD connected, maybe
after things are known to be fixed.


I had one freeze when downloading and installing stuff from Steam,
maybe the same as before when downloading and installing stuff
from Windows Updates. But as noted in a prior post, the problems
stopped after removing the SDD drive, before reinstalling it. That
is when I noticed the C4 setting causes an abnormality in
Performance Monitor.

As noted before, I caught a glimpse of CHKDSK reporting bad
sectors on the SDD. After that, CHKDSK showed no errors. But
recently, running CHKDSK from within Windows XP SP3 debugging
mode, CHKDSK clearly showed about 32 kB of bad sectors on the SDD.

Using the SSD can cause things to fall apart immediately. The last
time, Windows would not even boot, same as happened many times
before.

My copies of drive C Windows could have been corrupted by the bad
sectors on the SDD. I am hoping that maybe it will be corrected by
reinstalling stuff.

I am meticulous with cables, so I doubt as Paul posed it is an
SATA cable, but will keep my eyes open and maybe test for that
possibility.

In a heightened state of awareness, I occasionally hear an
extremely short glitch sound through the speakers, like when
listening to text-to-speech or watching and listening to streaming
media, it sounds something like radio scanner stuff. But that
probably is not related to the cause, since errors also occur when
not using sound.

Other observations...

... the HDD clicks around a lot when in the BIOS, but eventually
stops

... the HDD gets relatively hot, maybe when doing stuff outside of
Windows like restoring a copy of drive C

... when freezes occurred, they were followed by about 10 seconds
of HDD activity

That stuff might be typical. Hearing the HDD clicking might just
be strange after using the SDD for a year.

I have already had enough punishment IMO, so I hesitate to
reinstall the SSD for anything except maybe testing it. Using only
the HDD, I would expect things to fall apart again unless the SSD
(or conceivably the cable) was at fault. I might look for an SSD
tester. Or just wait until buying another to replace it. I miss
the quick bootup times of the SDD over the HDD, but restoring a
copy of Windows is faster on the HDD, even when copying to itself.

Thanks.

Can you run HDTune ?

http://www.hdtune.com/files/hdtune_255.exe

Use the "Error Scan", and see how many bad blocks it reports.
If CHKDSK did a media scan and found bad blocks, then HDTune should
be able to see them as well. And such a bad block, would be a
CRC error detected from the media itself, rather than a cable error.

Paul
 
Paul said:
Use the "Error Scan", and see how many bad blocks it reports.
If CHKDSK did a media scan and found bad blocks,

It did, then it didn't, then it did again.
then HDTune should be able to see them as well. And such a bad
block, would be a CRC error detected from the media itself,
rather than a cable error.

The SDD was plugged in and corrupted to make sure it was not
bootable, then Windows XP formatted and labeled it drive G.

This time CHKDSK was run from within Windows XP, the SSD showed
5080 KB in bad sectors, and the same result twice more after
reboots. So I shut down, replaced the SATA cable, and got the same
result. And twice more the same result with two reboots.

The HD Tune showed no errors on either of two slow runs. But HD
Tune benchmark was very choppy and showed an average of 98.8
MB/second transfer rate. Shortly after buying the SDD drive (April
2009) and updating firmware, it showed a nice even graph and a
steady transfer rate averaging about 220 MB/second.

Looks like Newegg and/or or the makers have stopped including
"1,500,000 hours MTBF" for SSD drives in the description.
That probably was BS.



--

HD Tune: OCZ-VERTEX Health
ID Current Worst ThresholdData Status
(01) Raw Read Error Rate 8 0 0 0 Ok
(09) Power On Hours Count 36 33 0 0 Ok
(0C) Power Cycle Count 57 1 0 0 Ok
(B8) (unknown attribute) 25 0 0 0 Ok
(C3) Hardware ECC Recovered 0 0 0 0 Ok
(C4) Reallocated Event Count 0 0 0 0 Ok
(C5) Current Pending Sector 0 0 0 0 Ok
(C6) Offline Uncorrectable 210 117 0 166354 Ok
(C7) Ultra DMA CRC Error Count 21 99 0 125871 Ok
(C8) Write Error Rate 123 180 0 5487 Ok
(C9) TA Counter Detected 250 4 0 2446 Ok
(CA) TA Counter Increased 195 106 0 6 Ok
(CB) Run Out Cancel 150 99 0 5 Ok
(CC) Soft ECC Correction 0 0 0 0 Ok
(CD) Thermal Asperity Rate 16 39 0 0 Ok
(CE) Flying Height 188 23 0 0 Ok
(CF) Spin High Current 95 160 0 0 Ok
(D0) Spin Buzz 14 55 0 0 Ok
(D1) Offline Seek Performance 83 0 0 0 Ok
....

This is another output several hours later.

HD Tune: OCZ-VERTEX Health
ID Current Worst ThresholdData Status
(01) Raw Read Error Rate 8 0 0 0 Ok
(09) Power On Hours Count 37 33 0 0 Ok
(0C) Power Cycle Count 57 1 0 0 Ok
(B8) (unknown attribute) 25 0 0 0 Ok
(C3) Hardware ECC Recovered 0 0 0 0 Ok
(C4) Reallocated Event Count 0 0 0 0 Ok
(C5) Current Pending Sector 0 0 0 0 Ok
(C6) Offline Uncorrectable 105 54 0 167930 Ok
(C7) Ultra DMA CRC Error Count 239 151 0 125873 Ok
(C8) Write Error Rate 9 146 0 5507 Ok
(C9) TA Counter Detected 42 11 0 2446 Ok
(CA) TA Counter Increased 42 151 0 6 Ok
(CB) Run Out Cancel 163 142 0 5 Ok
(CC) Soft ECC Correction 0 0 0 0 Ok
(CD) Thermal Asperity Rate 16 39 0 0 Ok
(CE) Flying Height 188 23 0 0 Ok
(CF) Spin High Current 95 160 0 0 Ok
(D0) Spin Buzz 14 55 0 0 Ok
(D1) Offline Seek Performance 83 0 0 0 Ok
....
 
John said:
It did, then it didn't, then it did again.


The SDD was plugged in and corrupted to make sure it was not
bootable, then Windows XP formatted and labeled it drive G.

This time CHKDSK was run from within Windows XP, the SSD showed
5080 KB in bad sectors, and the same result twice more after
reboots. So I shut down, replaced the SATA cable, and got the same
result. And twice more the same result with two reboots.

The HD Tune showed no errors on either of two slow runs. But HD
Tune benchmark was very choppy and showed an average of 98.8
MB/second transfer rate. Shortly after buying the SDD drive (April
2009) and updating firmware, it showed a nice even graph and a
steady transfer rate averaging about 220 MB/second.

Looks like Newegg and/or or the makers have stopped including
"1,500,000 hours MTBF" for SSD drives in the description.
That probably was BS.

I took a look with a search engine, and found this thread.
It looks like the SMART statistics are different on SSD.

http://www.ocztechnologyforum.com/forum/showthread.php?59144-bad-SMART-results/page2

The recommended app is CrystalDiskInfo.

http://www.ocztechnologyforum.com/forum/attachment.php?attachmentid=10969&stc=1&d=1247172202

Now, I need to dig up more info on the definition of "Remaining Drive Life".
Presumably it's related to spare sectors in some way. Or to a averaging
of the number of times each 128K block has been written. Hmmm.

If it was me, I'd probably do a sector by sector backup, then a
sector by sector restore. The advantage of that, is any flaky
blocks would have a chance to get spared out during the write
operation. I'd probably use "dd", with a block size of 128KB or
some multiple of that. I'd check the "Remaining Drive Life"
parameter, before and after those steps. With luck, the SSD
is a multiple in size, of 128KB, and so the block_size times
the count, will be equal to the total drive capacity.

You could spend the whole day on that site.

http://www.ocztechnologyforum.com/forum/showthread.php?63158-4-months-amp-2000-hours

Model-specific erasure utilities. I wonder if that
uses the ATA Secure Erase command ? Execution time is
apparently quite short. Doing an erasure by
zeroing the sectors one at a time, would take a lot
longer. But if Secure Erase is used, that involves
passwords, so that can't be it either. It might also
be using the NAND block erase functions somehow.
Another puzzle.

http://www.ocztechnologyforum.com/forum/showthread.php?69503-How-to-use-Sanitary-Erase

Paul
 
I updated the SDD firmware, and it just died. There will be no
more testing, except by its absence.
 
John said:
I updated the SDD firmware, and it just died. There will be no
more testing, except by its absence.

Firmware will do that.

Any details on how recoverable a bad flash is on those things ?

Paul
 
Paul said:
John Doe wrote:

Firmware will do that.

Any details on how recoverable a bad flash is on those things ?

I certainly do not know.
Nothing that I have can even see it now.
I will return it, or buy a different brand.
 
After the failed firmware update, the 32 GB SDD can be seen only when
its jumper is bridged.

In the BIOS...

YATAPDONG BAREFOOT-ROM 00.P80

In Windows...

Unknown
128.00 GB
Not Initialized

Disk Director cannot format the drive.

Anyway... Since the SDD has been disconnected, occasional freezing
has not stopped. But so far, my system has not fallen apart like with
the SDD. It could be some corrupted files. Would like to check my
replacing the video card, but I goofed by getting rid of the VGA
cable the other video card requires. I also suspect the CPU,
motherboard, or the power supply. So I reduced the CPU clock speed
from 333 to 300, that also reduces the system bus speed (I guess).
And have prepared to do another reinstallation of Windows that would
prove or disprove the corrupted files theory right away. Last time I
tried reinstalling Windows, it failed, but that was with the SDD.
 
John said:
After the failed firmware update, the 32 GB SDD can be seen only when
its jumper is bridged.

In the BIOS...

YATAPDONG BAREFOOT-ROM 00.P80

In Windows...

Unknown
128.00 GB
Not Initialized

Disk Director cannot format the drive.

Anyway... Since the SDD has been disconnected, occasional freezing
has not stopped. But so far, my system has not fallen apart like with
the SDD. It could be some corrupted files. Would like to check my
replacing the video card, but I goofed by getting rid of the VGA
cable the other video card requires. I also suspect the CPU,
motherboard, or the power supply. So I reduced the CPU clock speed
from 333 to 300, that also reduces the system bus speed (I guess).
And have prepared to do another reinstallation of Windows that would
prove or disprove the corrupted files theory right away. Last time I
tried reinstalling Windows, it failed, but that was with the SDD.


When you say "occasional freezing has not stopped", is that using
one of your known-good backup images, a fresh install, or an install
you've been using for a while ? For example, if you run System File
Checker (SFC), does that help ? (SFC is a bugger to get running -
I had to change two entries in the registry, the last time I used it.)
SFC should check each critical OS file. And presumably has the
necessary logic, to sort out whether a file came from i386, some
CAB, Windows Update, and so on. When I used it, I got the distinct
impression, it was just copying the files, rather than actually
checking the files currently in place.

*******

Have you tried booting a Linux LiveCD, and see if that shows freezing
as well ? That would be a cross-check, that the issue was hardware
or a BIOS setting. Rather than an issue with OS files or malware. If
you need to do a read-only operation on the hard drive, from Linux,
it would be along the lines of this.

dd if=/dev/sda of=/dev/null

And when that is running in one terminal window, you can use
Synaptic Package Manager (in Ubuntu) and install "iotop". That
is a performance monitor, that reports I/O rate per individual
running utility. For example, with a "dd" command with no block
size or count specified, iotop might show 13MB/sec (as it
reads one sector at a time). Or, if you specify size parameters,
you might get 39 to 50MB/sec or so.

dd if=/dev/sda of=/dev/null bs=262144 count=10000 (read 2.5GB of data)

In that example, the product of bs*count would be smaller than
the hard drive capacity. I find power_of_two block size values,
tend to make the reading of the disk a tiny bit faster (doesn't
really make any sense, as a track on a disk, isn't likely to be a
power_of_two in size).

The read rate probably won't be smooth - but what you'd be looking
for, is whether the computer freezes up, while you're doing it.

You can run something like "glxgears", if you need an animation
to help detect freezing or stuttering. It's a pretty poor check,
but all that is immediately available in Linux.

Paul
 
Paul said:
When you say "occasional freezing has not stopped",

That was the first of a conglomerate, it began two or three weeks
ago, and it is the only current symptom. Some of the other
symptoms were amazing.
is that using one of your known-good backup images,

Things change, maybe due to carelessness or confusion, and due to
limited storage space for more than about two backups. Perhaps I
should have known the possibility that a hardware failure might
corrupt files. I might have overwritten my good backups with bad.
a fresh install,

Yes, but that failure was while using the SDD.
Have you tried booting a Linux LiveCD, and see if that shows
freezing as well ?

I should do the reinstallation very soon. That IMO is the most
direct and potentially useful way to determine whether the problem
was the SDD.
--
 
I should do the reinstallation very soon...to determine whether
the problem was the SDD.

Actually, I am confident that the SDD was messed up. Installed the
new power supply today and it still spontaneously rebooted. I
think something happened to a file or of the filesystem or
something, whether or not caused by the SDD. Another of very many
symptoms is strange mouse behavior. A while back (maybe a few
weeks), the scroll wheel left click stopped working. Recently, a
single click is sometimes interpreted as a double-click. A few
days ago I got a rarely if ever before seen message that the mouse
driver needed to be reinstalled (and it was). That is something I
can look for when doing the reinstall now, the double-click thing
and of course freezing/reboots. If so... @#$!
 
The SSD showed 5080 KB in bad sectors
HD Tune benchmark was very choppy and showed an average of 98.8
MB/second transfer rate.

A few days after disabling C2/C2E State Support (its child C4 setting
never being enabled) and running error-free, I wonder whether the SDD
bad sectors was just coincidence. Should be receiving a new SDD drive
today and will at least be able to see whether it has any bad sectors
and whether it runs at the proper speeds. I will not risk re-enabling
C2/C2E State Support, but that is known-bad anyway. Maybe the error
producing C2/C2E State Support BIOS setting (or the resulting
spontaneous reboots) somehow caused the SDD bad sectors.

Only the Shadow Copy knows...
 
Back
Top