Disk to disk copying with overclocked memory

  • Thread starter Thread starter Mark M
  • Start date Start date
If you've overclocked the RAM, yes, there is a chance of getting timing
errors in the data which will lead to data corruption.
 
Folkert Rienstra said:

That it was not corrupt in transmit from/to disk. For example, memory
allocated from kernel non-paged pool. One time I caught the crash with a
debugger, and this was KEVENT structure corrupt.
threads.

What happened to that memory test. Last time I heard about it was when c't
complained about you not supporting it anymore.
Version 2 is available on
http://home.earthlink.net/~alegr/download/memtest.htm

Previous location at www.aha.ru is offline, AFAIK.
 
Arno said:
In comp.sys.ibm.pc.hardware.storage Alexander Grigoriev


Really nasty. Shows that these things have gotten far to complex...

Dunno about that, I ran into something similar with a genuine original PC.
Customer brought it in reporting a parity error. He put his program disk
in, ran it, sure enough a parity error popped right up. I ran a diagnostic
on it overnight, found nothing, put his program in, parity error popped
right up again. Replaced the RAM. Same thing. Tried his program on
another machine, no problem. Never did find out the cause, just replaced
the planar and that was the end of it. Only thing I can figure is that
something was busted in the processor or the supporting circuitry that was
state dependent--had to go through a certain sequence of states before the
error occurred. Should have hung onto the board and dug into it but at the
time I figured (probably correctly) that I'd never get around to it.
 
Mark M said:
I use a partition copier which boots off a floppy disk before any
other OS is launched.

If I copy a partition from one hard drive to another, then is there
any risk of data corruption if the BIOS has been changed to
aggressively speed up the memory settings?

For example the BIOS might set the memory to CAS=2 rather than
CAS=3. Or other memory timing intervals might also be set to be
shorter than is normal.

I am thinking that maybe the IDE cable and drive controllers handle
data fairly independently of the memory on the motherboard. So
maybe data just flows up and down the IDE cable and maybe the
motherboard is not involved except for sync pulses.

There are three scenarios I am thinking about:

(1) Copying a partition from one hard drive on one IDE cable to
another hard drive on a different IDE cable.

(2) Copying a partition from one hard drive to another which is on
the same IDE cable.

(3) Copying one partition to another on the same hard drive.

How much effect would "over-set" memory have on these situations?

Do the answers to any of the above three scenarios change if the
copying of large amounts of data files is done from within WinXP?
Personally, I would guess that it is more likely that motherboard
memory comes into play if Windows is involved.

How about something easy...Loosen your memory timings.(CAS3)...perform the
copy...then tighten them back up...Voila! No corruption!
 
SpongeBob said:
.... snip ...

How about something easy...Loosen your memory timings.(CAS3)...
perform the copy...then tighten them back up...Voila! No corruption!

However the partition you want to copy has already been messed up
by the memory faults, in ways that may not show up for months. If
you have ECC installed and active you might get away with it.
Copying corrupt data does not repair it.
 
CBFalconer said:
However the partition you want to copy has already been messed up
by the memory faults, in ways that may not show up for months. If
you have ECC installed and active you might get away with it.
Copying corrupt data does not repair it.

Well...from what was written...the question was theoretical...data
corruption has not occurred, there has been no evidence of system
instability, and he has not copied anything yet. High quality memory can
often handle tighter timings without ill effect. Test with memtest86 and
Prime95 for a few days to be sure of stability. If not...loosen the timings
and copy away! To answer the question...tighter memory timings may or may
not affect data transfer...It all depends on your memory and your system.
 
Well...from what was written...the question was theoretical...data
corruption has not occurred, there has been no evidence of system
instability, and he has not copied anything yet. High quality memory can
often handle tighter timings without ill effect. Test with memtest86 and
Prime95 for a few days to be sure of stability. If not...loosen the timings
and copy away! To answer the question...tighter memory timings may or may
not affect data transfer...It all depends on your memory and your system.

There is nothing about data transfer that would inherently make it more
susceptible to memory errors... in theory if the memory were to affect
that then everything done on the system is subject to, likley to encounter
these errors. A system certainly could appear to function stabily for a
while with memory errors occuring, depending on their frequency, but it's
the worst possible situation having unknown problems and data corruption
instead of a more extreme case, an immediate OS crash before the OS even
manages to boot, load.
 
Quick Response to Subject Header:

Diskette activity involves DMA, and RAM timings that may be tolerated
for CPU access (thus pass diags) may fail under DMA, Consider this if
crashes when accessing diskettes are the only problems you have.


-------------------- ----- ---- --- -- - - - -
Running Windows-based av to kill active malware is like striking
a match to see if what you are standing in is water or petrol.
 
Folkert is correct. I did some research into this and standard non-ECC
memory - the kind found in almost all home PCs - does not do error detection
of any kind. The reasoning is that the error rate is so astoundingly low
that the cost of adding the logic to detect errors cannot be justified.

According to one source I found, actual occurrances of memory errors in home
PCs are usually caused by contamination of the DIMM connectors by the
installer. Translation: finger goo causes most of the errors.

cp
 
There is nothing about data transfer that would inherently make it more
susceptible to memory errors...

If the DMA system errors while CPU access does not (think square wave
pulse edges, rise time, local charge pump capacitors etc.) then
MemTest86 et al may pass, while UIDE transfers may fail.

Other possibilities:
- buggy VIA 686B Southbridge problem (eats UIDE data)
- bad or noisy UIDE data cables
- bad cache RAM on the HD itself
- static buildup on ungrounded HD

The last is something that bit me; a couple of 32-bit errors in the
midst of a 500M bulk file copy, when the loose HD was not touching the
chassis. Grounding the HD solved the problem, hence cause presumption


-------------------- ----- ---- --- -- - - - -
Running Windows-based av to kill active malware is like striking
a match to see if what you are standing in is water or petrol.
 
In comp.sys.ibm.pc.hardware.storage Colin Painter said:
Folkert is correct. I did some research into this and standard non-ECC
memory - the kind found in almost all home PCs - does not do error detection
of any kind. The reasoning is that the error rate is so astoundingly low
that the cost of adding the logic to detect errors cannot be justified.

Actually it is not the logic. That is very simple and cheap.
It is the additional bits needed, which is one per 8 bits and
makes memory (theoretically) 12.5% more expensive. The market
is not willing to pay that much for improved reliability. Sad
but true. (ECC is more expensive because of the additional bit,
but more so because of the lower volume sold.)
According to one source I found, actual occurrances of memory errors
in home PCs are usually caused by contamination of the DIMM connectors
by the installer. Translation: finger goo causes most of the errors.

Incorrect (read: too high) clock settings are more likely the
main source today. The memory contacts have gotten better and
tin-plating is not so common anymore as it was with DIL-memory
and the first generation of DIMMs.

Arno
 
If the DMA system errors while CPU access does not (think square wave
pulse edges, rise time, local charge pump capacitors etc.) then
MemTest86 et al may pass, while UIDE transfers may fail.

True. But DMA is usually far less agressive in its timing, so
this is a rather unlikely scenario.
Other possibilities:
- buggy VIA 686B Southbridge problem (eats UIDE data)
- bad or noisy UIDE data cables That should cause CRC errors.
- bad cache RAM on the HD itself
- static buildup on ungrounded HD
No. The HDD is grounded through the power cable and through the
data cable. Unless the whole mainboard is not grounded properly.
The last is something that bit me; a couple of 32-bit errors in the
midst of a 500M bulk file copy, when the loose HD was not touching the
chassis. Grounding the HD solved the problem, hence cause presumption

Strange. The hdd has two low-resistance paths to the chassis.
These should be enough. Hmmm. O.k., so that may cause problems
in some setups. But it cannot be static buildup in the hdd.
Maybe you where statically charged and touched the HDD? That
yould have induced a spike in some logic-lines...

Arno
Running Windows-based av to kill active malware is like striking
a match to see if what you are standing in is water or petrol.

P.S.: Like the sig!
 
In comp.sys.ibm.pc.hardware.storage "cquirke (MVP Win9x)"

comp.sys.ibm.pc.hardware.storage, eh? Been long since I was there,
and have an "interesting" problem that may fit there if anyone's
interesting in FATxx data recovery.
True. But DMA is usually far less agressive in its timing, so
this is a rather unlikely scenario.

That's an interesting observation. I first thought of this (and AGP)
issues when I had an old PCI / ISA system that was fine as long as you
didn't play digitised sound or use the diskette. A provocative test
that almost always blew that system up was to copy the Start Menu (or
any other collection of small files) to a diskette. BSoD !!

So I thought "what do digitisesd sound (even .wav playback in Sound
Recorder, while FM-only DOS games didn't turn a hair) and diskette
drives have in common?" and the answer that came to me was: DMA.

Then I realised that no matter how exhaustive RAM testing programs
are, they are still only testing CPU memory access.

But after all, it's only conjecture on my part.

Then again, it was new RAM that fixed the diskette-and-digital-sound
problem, even tho the rather useless RAM testing apps I was using at
that time passed the old RAM as fine.
No. The HDD is grounded through the power cable and through the
data cable. Unless the whole mainboard is not grounded properly.

At the time I had the minitower open and extrra HDs sitting outside
the case, not touching anything. That was my SOP at the time, and UI
found a whole series of scrape-overs had the same problems; Doom would
always crash, and there was one other app that also misbehaved.

In those DOSsy days, apps were well-behaved and lived purely in their
own directory trees. So to pass on an active shareware-and-demos
collection, you'd just bulk-copy the Games subtree.

Eventually, I did an FC /B on 500M of stuff and found two errors -
both being 32-bit runs of junk ("snakebites"). So I did the same 500M
bulk transfer a few times, and each time I'd get 0 to 4 (max)
snakebites, always 32-bits. Then I tried grounding the HD's shell (I
put an old metal slot break-out strip from shell to case) and the
problems went away, on two tests (I was suffering "test fatigue" by
then, hence only two tests!).

That was the mileage, and as the HDs of those times wren't hot and
fast monsters, I attributed it to static build up. Since then I
always ground the HD's shell to case, and SF,SG.
Strange. The hdd has two low-resistance paths to the chassis.
These should be enough.

Yep; it's odd. The old full-height monsters used to have grounding
tags (using auto-electronics cables!) and I used to laugh at that :-)
Maybe you where statically charged and touched the HDD? That
yould have induced a spike in some logic-lines...

No, I don't think so - I generally don't touch what I don't need to
touch, and tend to leave these bulk ops to run unattended.
P.S.: Like the sig!

-- Risk Management is the clue that asks:
"Why do I keep open buckets of petrol next to all the
ashtrays in the lounge, when I don't even have a car?"
 
Back
Top