Using a hard disk drive as an archival media

  • Thread starter Thread starter JW
  • Start date Start date
J

JW

After reading about the (not so) longevity of using CD-R and DVD-R as
archival media I decided to get two 2TB Hitachi drives from Newegg keeping
them mirrored in case one dies. As I was moving the data from the optical
media to the fixed disks, about 10 of 350 or so optical discs I had gave
me fits (CRC errors, bad blocks, etc.) I was able to recover 8 of them and
2 were a complete loss. (Note that I always used verify after burning to
make sure the discs were good.)

Then I came across this:
http://www.larryjordan.biz/articles/lj_hard_disk_warning.html

So, do I still need to worry, or is that bullshit? Perhaps I need to look
for a utility that "refreshes" the data by reading and writing the data
back to disk?

I have an old Engineer friend who used to work for Quantum a number of
years ago and his response was "Depends on the media but yes, it can lose
data if power off for a very long time and if subjected to magnetic
interference." Of course being more than 10 years ago, things have
possibly changed...

Thanks.
 
JW wrote
After reading about the (not so) longevity of using CD-R and DVD-R as
archival media I decided to get two 2TB Hitachi drives from Newegg
keeping them mirrored in case one dies. As I was moving the data from
the optical media to the fixed disks, about 10 of 350 or so optical
discs I had gave me fits (CRC errors, bad blocks, etc.) I was able to
recover 8 of them and 2 were a complete loss. (Note that I always
used verify after burning to make sure the discs were good.)
So, do I still need to worry, or is that bullshit?

Its drivel. There is no 'refereshing' of the data on a hard drive that
is kept powered up in the PC with the files that are only written once.
Perhaps I need to look for a utility that "refreshes" the
data by reading and writing the data back to disk?

No need, tho it isnt hard to do that just by copying the entire contents
of the drive to another drive of the same size periodically.
I have an old Engineer friend who used to work for Quantum a
number of years ago and his response was "Depends on the
media but yes, it can lose data if power off for a very long time
and if subjected to magnetic interference."

It isnt subject to magnetic interference when the drive is appropriately
stored where it isnt anywhere near big motors etc.
Of course being more than 10 years ago, things have possibly changed...

No they havent in that sense.
 
After reading about the (not so) longevity of using CD-R and DVD-R as
archival media I decided to get two 2TB Hitachi drives from Newegg keeping
them mirrored in case one dies. As I was moving the data from the optical
media to the fixed disks, about 10 of 350 or so optical discs I had gave
me fits (CRC errors, bad blocks, etc.) I was able to recover 8 of them and
2 were a complete loss. (Note that I always used verify after burning to
make sure the discs were good.)

Then I came across this:
http://www.larryjordan.biz/articles/lj_hard_disk_warning.html

So, do I still need to worry, or is that bullshit? Perhaps I need to look
for a utility that "refreshes" the data by reading and writing the data
back to disk?

I have an old Engineer friend who used to work for Quantum a number of
years ago and his response was "Depends on the media but yes, it can lose
data if power off for a very long time and if subjected to magnetic
interference." Of course being more than 10 years ago, things have
possibly changed...

Thanks.

I would use more than two backup drives. Three is good,
six is better.

Lynn
 
I'm waiting to see what Arno will say, but

"I do have drives here that have not been spun up for several years, and
they are fine,"

is a bit strange -- how does he know the drives that have not been spun
up are fine? If it's so important, why wasn't he spinning them up every
year and scanning them?

Nevertheless, it's easy to do a total scan with whatever software once a
year.

But he claims that just reading the data refreshes it. I don't see it;
the read head must be passively receiving the magnetic fluctuation, I'd
think. So, the scan would seem to find areas that are fading, but do
nothing for the other areas that are only supposed to be lasting a year.

Maybe it's a hoax?

But I'm sure Arno can enlighten us.
--
Ed Light

Better World News TV Channel:
http://realnews.com

Iraq Veterans Against the War and Related:
http://ivaw.org
http://couragetoresist.org
http://antiwar.com

Send spam to the FTC at
(e-mail address removed)
Thanks, robots.
 
JW said:
After reading about the (not so) longevity of using CD-R and DVD-R as
archival media I decided to get two 2TB Hitachi drives from Newegg keeping
them mirrored in case one dies. As I was moving the data from the optical
media to the fixed disks, about 10 of 350 or so optical discs I had gave
me fits (CRC errors, bad blocks, etc.) I was able to recover 8 of them and
2 were a complete loss. (Note that I always used verify after burning to
make sure the discs were good.)
So, do I still need to worry, or is that bullshit? Perhaps I need to look
for a utility that "refreshes" the data by reading and writing the data
back to disk?
I have an old Engineer friend who used to work for Quantum a number of
years ago and his response was "Depends on the media but yes, it can lose
data if power off for a very long time and if subjected to magnetic
interference." Of course being more than 10 years ago, things have
possibly changed...

It is not quite BS, but not quite true either. First, HDDs
do not refresh their surfaces by themselves. You need to either
run long SMART selftests on them manually or configure them
to do automatic selftests, which not all disks can do.

I do have drives with none-critical system images that
were off 2 years and longer, with no data loss.

That said, data endurance of a HDD surface is somehwere in
the 10 year area, _but_ there may be weak sectors that can
become unreadable far sooner, say within a few months or
a year. Also, a HDD has roughly a 5%/year failure rate,
and it is unclear whether this rate is significantly lower
when it is off. It might be higher (but not a lot).

This is the reason I run a long SMART selftests on
all disks that are online every 14 days. For disks used
for backup, I do this 1-2 times a year manually.

My recommendation is that backup and long-term storage to
external disk is fine, but that you should run a long
SMART selftest every 6-12 months or so and check the
SMART attributes aftwards. Also, for backup and critical
long-term storage, you should use at least 3 independent
media sets. "Independent" is no joke here. If you put them
all in the same box, they are not independent. If they are
in the same room, they are not independent. Same house,
depends. It is far too easy to, e.g. drop all 2 (3)
drives and kill them all simultaneously, if they are
stored in the same place. Fire and water damage might
also be a concern.

Depending on your local customs, you moght be able to
rent a safe deposit boxk chealy. You can then put
one of the drives in there and swap it out once a year
(testing the one that goes in before and the one that
comes oyt afterwards). The third copy can, for ecample,
be kept in your dest at your workplace (depends on local
customs, here it would be entirely unproblematic in
most places), at a friend's house, in ordinary storage
or at a second bank. Use the same swap-policy as above,
i.e. the disks get rotated and checked at least once
a year.

Arno
 
I would use more than two backup drives. Three is good,
six is better.

I completely agree.

The rationale between three is as follows: You can always make
a mistake (drop it, overwrite by accident) and with three
you are still redundant. Also, if you damage the drive in a
more obscure fashion (bad power, malware, etc.) with two, your
data is likely gone. With three, you will get two warnings
and see that the damage is repeatable and you will still have
one good copy at that time that you can then be very, very
careful with.

Arno
 
I completely agree.

The rationale between three is as follows: You can always make
a mistake (drop it, overwrite by accident) and with three
you are still redundant. Also, if you damage the drive in a
more obscure fashion (bad power, malware, etc.) with two, your
data is likely gone. With three, you will get two warnings
and see that the damage is repeatable and you will still have
one good copy at that time that you can then be very, very
careful with.

Arno

BTW, I use ten backup hard drives. Three are internal to our
LAN on various PCs. The other seven are external USB drives
and are swapped each week. Each drive (internal or external)
has a complete copy of the LAN on it using Robocopy.

Lynn
 
Depending on your local customs, you moght be able to
rent a safe deposit boxk chealy.

I have one of my USB backup drives in a safe deposit box. I take it out
once a month and update the backups. Unfortunately HD Sentinel can't
read the SMART data. I think it can do a full scan, though.

--
Ed Light

Better World News TV Channel:
http://realnews.com

Iraq Veterans Against the War and Related:
http://ivaw.org
http://couragetoresist.org
http://antiwar.com

Send spam to the FTC at
(e-mail address removed)
Thanks, robots.
 
Lynn said:
BTW, I use ten backup hard drives. Three are internal to our
LAN on various PCs. The other seven are external USB drives
and are swapped each week. Each drive (internal or external)
has a complete copy of the LAN on it using Robocopy.

When it comes to rotating backups where the oldest is deleted, I use at
least 2 or 3, but I use an extra step to enable up-to-date restore from any
one.

Each weekend it does a full backup to one of them, alternately.

Other days it does a differential to each one.

The trick is that just before the full backup to drive x, it does another
differential to every drive but x, to bring them up-to-date for the week.
That differential isn't deleted until another full backup is done on its
drive.

So, on drive x, there is a full backup from last weekend, and yesterday's
differential.

On the drive from the previous weekend there is a full backup plus a
differential from this weekend, plus yesterday's differential.

There's an extra step in restoring from the other drive, but it's complete.
And this is at the cost of only the time needed to do another differential
backup.

To reduce wear and tear if using more than 2 drives, I only do 2
differential backups on weekdays in a rotating pattern. So 2 are fully
up-to-date and the next is only 1 day old.
 
I have one of my USB backup drives in a safe deposit box. I take it out
once a month and update the backups. Unfortunately HD Sentinel can't
read the SMART data. I think it can do a full scan, though.

The SMART scan is preferrable, but a full read scan is
a reasonable substitute if you cannot get more. I used
timed full reads to detect disks with problems in a
computer cluster before SMART became usable under Linux.
The trick here is to look at the read timing. If reads
take long, there are problems in that area. If I remember
correctly, HDD Sentinel gives color markings for the
read time per block, which should still give reasonable
warning. And the refresh function for possibly weak sectors
works just as well with ordinary reads.

What do you pay for the box? Here in Swizerland it
is something like 100EUR/USD per year for a standard-sized
box that can take two external 3.5" drives. Although
I admit I have been too lazy so far and are just storing
a backup drive in the cellar. Given that this cellar is
actually an nuke-proof bunker (_not_ for a direct or close
hit), this is probably reasonable ;-)

Arno
 
After reading about the (not so) longevity of using CD-R and DVD-R as
archival media I decided to get two 2TB Hitachi drives from Newegg keeping
them mirrored in case one dies. As I was moving the data from the optical
media to the fixed disks, about 10 of 350 or so optical discs I had gave
me fits (CRC errors, bad blocks, etc.) I was able to recover 8 of them and
2 were a complete loss. (Note that I always used verify after burning to
make sure the discs were good.)

Then I came across this:
http://www.larryjordan.biz/articles/lj_hard_disk_warning.html

So, do I still need to worry, or is that bullshit? Perhaps I need to look
for a utility that "refreshes" the data by reading and writing the data
back to disk?

I have an old Engineer friend who used to work for Quantum a number of
years ago and his response was "Depends on the media but yes, it can lose
data if power off for a very long time and if subjected to magnetic
interference." Of course being more than 10 years ago, things have
possibly changed...

Thanks.




No just use Gold 100 year archival CD/DVD the DVD's are only rated at
80 years, there some 4-5 firms that make them or Cased DVD RAM 100 year
life span.
 
William Brown said:
No just use Gold 100 year archival CD/DVD the DVD's are only rated at
80 years, there some 4-5 firms that make them or Cased DVD RAM 100 year
life span.

CD/DVD in whatever version are unsuitable for archiving, unless
you do extensive evaluation on a specific burner+media combination
and have a binding assurance from the media vendor that they
will not change the media characteristics. Even with that,
100 years is completely fictional under real conditions.

Arno
 
CD/DVD in whatever version are unsuitable for archiving, unless
you do extensive evaluation on a specific burner+media combination
and have a binding assurance from the media vendor that they
will not change the media characteristics. Even with that,
100 years is completely fictional under real conditions.

Additionally, having the data on DVD and CD-R media was becoming a real
hassle anyway when I needed to access it. I think I will go with the
advice of using 3 drives, and keeping one of them offsite. I'll also
schedule long SMART tests. Would Hitachi's DFT be the correct tool
http://www.hitachigst.com/support/downloads/#DFT

Or is there a third party tool that is better or recommended?

Thanks to all for your suggestions.
 
Additionally, having the data on DVD and CD-R media was becoming a real
hassle anyway when I needed to access it. I think I will go with the
advice of using 3 drives, and keeping one of them offsite. I'll also
schedule long SMART tests. Would Hitachi's DFT be the correct toolhttp://www.hitachigst.com/support/downloads/#DFT

Or is there a third party tool that is better or recommended?

Thanks to all for your suggestions.

As well as drives - consider online storage as PART of your backup
scheme.
 
Amusing and somewhat timely discussion, working through regenerating
backups.

DAT
Old 2003 DDS2 tape took a few goes to get data off in DDS4 drive,
despite correctly stored & retension first. I was considering using
DAT40 or DAT72 as a secondary backup device, but I fundamentally
distrust data restored from it. It is very easy to get a creeping
corruption which gets subsequently re backed over the years and only
detected years later. Not exactly good if it is tax & account
archives.

HD
Ironically I have never lost any data stored on hard drive, but I
introduce new drives into the pool regularly and use multiple brands.

CD/DVD
Early burners & poor dye were a recipe for disaster, running some A-
One Gold tonight on a burner I got 40% failure - just picked them up
on the way home. It shows the need to match media (Verbatim Indonesia
a better bet) with drive (which do not seem to be built with longevity
in mind).

MO
Magneto optical 3.5" has proven totally reliable, bar a few miss-
formatted discs in the early 2000s easily spotted as they were
unusable. Media from Sony, Maxell, Fujifilm & Verbatim - probably only
2 manufacturers in there. Small capacity at 640MB limits use to core
critical data, but longevity seems ok thus far. Spare brand new drives
checked and unlike dreaded DDS2 I can guarantee thus far to read a
disk in any drive without problem.

Paper
Ironically this still seems to be quite robust! Unfortunately the cost
of storing several hundred thousand pages is actually pretty
ridiculous in "self store" units these days. A lot of companies are
shredding for paper-less office.

Frankly it comes down to data set size & criticality, usually there is
a smaller critical data set which can not be regenerated and for that
higher redundancy across multiple media is financially practical
rather than applying the same cost structure to the entire data set.
Multiple copies in multiple locations really does matter. I am not too
impressed by "clouds" re recent security breaches, and apart from LTO
1/2" tape I am not impressed with tape - it makes me uneasy. HD makers
have a history of shovelling out the initial-dip products as they
change geographic location than binning them (every time Seagate
changed plant location the first products off the line were doorstops
but still ended up in retail).

Dropping both backups down the stairs is not a joke, been there - one
a HD & one a DVD-RAM cartridge. The HD was scrap, the DVD-RAM ok but
from re-using the same DVD-RAM disk in video recorders I am wholly
unimpressed by their "overwrite count" despite being type-4 cartridge.

MO & HD seems to win out, MO for the critical stuff and never sell
that spare drive. One reason why LTO is twice the price it appears
unfortunately.
 
Amusing and somewhat timely discussion, working through regenerating
backups.
DAT
Old 2003 DDS2 tape took a few goes to get data off in DDS4 drive,
despite correctly stored & retension first. I was considering using
DAT40 or DAT72 as a secondary backup device, but I fundamentally
distrust data restored from it. It is very easy to get a creeping
corruption which gets subsequently re backed over the years and only
detected years later. Not exactly good if it is tax & account
archives.

No surprise there. I did some reads with individual
timing (shows whether the drive needs rewind and retry)
a long time ago and decided that DAT was basically consumer
grade.
HD
Ironically I have never lost any data stored on hard drive, but I
introduce new drives into the pool regularly and use multiple brands.

I have, but not without warning. Unless you cound the
ine time were I made a 4-way Y-cable from a a pair where
red and yellow were reversed in one and I did not notice.
That killed 2 drives spectacularly. FOrtunately I had backup.
CD/DVD
Early burners & poor dye were a recipe for disaster, running some A-
One Gold tonight on a burner I got 40% failure - just picked them up
on the way home. It shows the need to match media (Verbatim Indonesia
a better bet) with drive (which do not seem to be built with longevity
in mind).
MO
Magneto optical 3.5" has proven totally reliable, bar a few miss-
formatted discs in the early 2000s easily spotted as they were
unusable. Media from Sony, Maxell, Fujifilm & Verbatim - probably only
2 manufacturers in there. Small capacity at 640MB limits use to core
critical data, but longevity seems ok thus far. Spare brand new drives
checked and unlike dreaded DDS2 I can guarantee thus far to read a
disk in any drive without problem.

I second that. Unfortunately my MOD drive was SCSI and I could
not get the controller to work anymore without jumpong through
hoops. I did pull all data off my MODs some years ago without
any problems, and some of them written 7-8 years before.
Definitely an archival media that deserves the term. Also
definitely not commercially viable, it seems people do not
care about longterm archiving or just do not get it. The number
of people still recommending consumer-trash writable DVDs
as archival medium in this group tells it all.

Paper
Ironically this still seems to be quite robust! Unfortunately the cost
of storing several hundred thousand pages is actually pretty
ridiculous in "self store" units these days. A lot of companies are
shredding for paper-less office.

I have one book that has a "data lifetime statement" of
several hundred years with regard to the paper in it.
Frankly it comes down to data set size & criticality, usually there is
a smaller critical data set which can not be regenerated and for that
higher redundancy across multiple media is financially practical
rather than applying the same cost structure to the entire data set.
Multiple copies in multiple locations really does matter. I am not too
impressed by "clouds" re recent security breaches, and apart from LTO
1/2" tape I am not impressed with tape - it makes me uneasy. HD makers
have a history of shovelling out the initial-dip products as they
change geographic location than binning them (every time Seagate
changed plant location the first products off the line were doorstops
but still ended up in retail).
Dropping both backups down the stairs is not a joke, been there - one
a HD & one a DVD-RAM cartridge. The HD was scrap, the DVD-RAM ok but
from re-using the same DVD-RAM disk in video recorders I am wholly
unimpressed by their "overwrite count" despite being type-4 cartridge.

DVD-RAM is really, really bad. I did evaluate a set of different
cartritges a while ago and was very unimpressed. Many did exceed
the ISO error limits and only repeated full overwrites grought
dem down a bit. I don't need this high level of media maintenance
at all.
MO & HD seems to win out, MO for the critical stuff and never sell
that spare drive.

All the major data recovery outfits do recover MOs.
One reason why LTO is twice the price it appears
unfortunately.

It is very unfortunate that MO did not get enough sales for the
vendors to continue development. I now use spinning drives
(one on my side with 3-way RAID1, 2 virtual servers) and
regular automated tests of the data for critical data
and two external HDs with hadrware tetst every few months
for less critical stuff. So far so good. But except going back
to MO, I do not see any write & store solution. As to the
"cloud", the word "pathetic" comes to mind.

BTW, I completely agree that archival data falls into "must have"
and "nive to have" classes that justify different levels of effort.

Arno
 
I second that. Unfortunately my MOD drive was SCSI and I could
not get the controller to work anymore without jumpong through
hoops. I did pull all data off my MODs some years ago without
any problems, and some of them written 7-8 years before.

Likewise.
I sold my SCSI MO drive & Adaptec EZ-SCSI PCMCIA because its short
cable was cumbersome with a laptop and a growing data set meant
generating 2-3 copies became expensive. I moved to HD & DVD-RAM,
mainly due to cheap 4.7/9.4GB phase change media & cheap drives.

Thankfully the HD backups remained intact: DVD-RAM demonstrated it
could lose data and when subjected to repeated overwrites could lose
video far sooner than specifications suggested.

I returned to MO very quickly.

Definitely an archival media that deserves the term. Also
definitely not commercially viable, it seems people do not
care about longterm archiving or just do not get it. The number
of people still recommending consumer-trash writable DVDs
as archival medium in this group tells it all.

I do not understand it either.
CDR DVD are ok for content which you can recreate - such as films.
My experience is they are more a vehicle for those selling than those
betting their own business survivability on them.
DVD-RAM is really, really bad. I did evaluate a set of different
cartritges a while ago and was very unimpressed. Many did exceed
the ISO error limits and only repeated full overwrites grought
dem down a bit. I don't need this high level of media maintenance
at all.

As I found, used like a "fixed disk" in a DVD-RAM video recorder their
overwrite count was dire before "recorder lockup".
All the major data recovery outfits do recover MOs.

MO was great for tax & medical records.
Tape requires good technology & good storage technique.
It is very unfortunate that MO did not get enough sales for the
vendors to continue development. I now use spinning drives
(one on my side with 3-way RAID1, 2 virtual servers) and
regular automated tests of the data for critical data
and two external HDs with hadrware tetst every few months
for less critical stuff. So far so good. But except going back
to MO, I do not see any write & store solution. As to the
"cloud", the word "pathetic" comes to mind.

Cloud is marketing creating a revenue stream out of a box of bits :-)
BTW, I completely agree that archival data falls into "must have"
and "nive to have" classes that justify different levels of effort.

The problem with all media is there comes a point when it moves from
cash-cow to cash-dog and quality control can decline, which may be the
media you just stored the business archives on. Hard drives are
particularly vulnerable to this, at least when miniscribe shipped
bricks you could chisel some 1s & 0s on them. The other good thing
about paper is (as Enron & Arthur Anderson found) that it takes a long
time to shred a pile of paper, but very little time to shred some
files (or DVDs).
 
The problem with all media is there comes a point when it moves from
cash-cow to cash-dog and quality control can decline, which may be the
media you just stored the business archives on. Hard drives are
particularly vulnerable to this, at least when miniscribe shipped
bricks you could chisel some 1s & 0s on them.

Before my time, I think when I got my first HDD, Miniscribe was
already out of the business.
The other good thing
about paper is (as Enron & Arthur Anderson found) that it takes a long
time to shred a pile of paper, but very little time to shred some
files (or DVDs).

Indeed. Whether that is an advantage or disadvantage depends on
your point of view though ;-)

Arno
 
Back
Top