My SATA transfer corruption issue is back !

Castor Nageur · Aug 22, 2011

Mobo: Gigabyte GA-P35-DS4 (rev 1.1)
CPU: Intel Quad Core Q6600
Disks: WD Caviar Green (EADS serie) SATA2 (1TB, 1.5TB)
Seagate Barracuda Green SATA3 (2TB) for my W7 partition
Memory: 4x2GB Corsair l DDR2-800 memory
Chipset: Intel ICH9R configured in AHCI mode

Hi all,

In a previous post, I explained that when copying some data from disk
to disk, the destination data was corrupted (~ destination different
than source).
I replaced my 8 GB GSkill memory and replaced it by some Corsair
memory. I did the copy test twice and the problem seemed to be solved.
I recently copied 670 GB of big files from one disk to another and
found that 10 files out of 359 were corrupted.
My previouis tests proved that my RAM and disks were fine.

I plan to burn a Linux/Ubuntu install disk image from a clean computer
then install it on a clean reformated hard disk on my corrupted
computer.
I will then copy/check the same files from Linux so I can exclude (or
not) an OS dependent problem.
If also have errors under Linux, I will compare then determine the
difference pattern then post it here before I go to buy a new
computer !

* Is there a system read/write/verify feature I could enable under
Linux so it will stop at the first error so I do not have to copy all
each time ?

Thanks in advance.

Arno · Aug 22, 2011

Castor Nageur said:
Mobo: Gigabyte GA-P35-DS4 (rev 1.1)
CPU: Intel Quad Core Q6600
Disks: WD Caviar Green (EADS serie) SATA2 (1TB, 1.5TB)
Seagate Barracuda Green SATA3 (2TB) for my W7 partition
Memory: 4x2GB Corsair l DDR2-800 memory
Chipset: Intel ICH9R configured in AHCI mode

Hi all,

In a previous post, I explained that when copying some data from disk
to disk, the destination data was corrupted (~ destination different
than source).
I replaced my 8 GB GSkill memory and replaced it by some Corsair
memory. I did the copy test twice and the problem seemed to be solved.
I recently copied 670 GB of big files from one disk to another and
found that 10 files out of 359 were corrupted.
My previouis tests proved that my RAM and disks were fine.

Not rally. Tersting ram under normal condition only indicates
it may be fine. For real testing you need to get it into an
extreme state (temperature, voltages and timing) and thest
there. This is generally not feasible to do at home.

I plan to burn a Linux/Ubuntu install disk image from a clean computer
then install it on a clean reformated hard disk on my corrupted
computer.
I will then copy/check the same files from Linux so I can exclude (or
not) an OS dependent problem.
If also have errors under Linux, I will compare then determine the
difference pattern then post it here before I go to buy a new
computer !

* Is there a system read/write/verify feature I could enable under
Linux so it will stop at the first error so I do not have to copy all
each time ?

Not to my knowledge. And if the corruption happens before the file
goes into the write-buffer, it would not help anyways.

Apart from that, your approach is sound. It may just be that
you have (had) more than one source of corruption. PC hardware
has gotten more reliable, but not at the same speed memopry and
disks have gotten larger.

Arno

helloworld · Aug 22, 2011

Castor Nageur <[email protected]> wrote:

Not rally. Tersting ram under normal condition only indicates
it may be fine. For real testing you need to get it into an
extreme state (temperature, voltages and timing) and thest
there. This is generally not feasible to do at home.

Anyway, I tested with 2 different brands (GSkill then Corsair) and
still get the problem.
The only thing I can tell is the occurrence is much lower with the
Corsair than with the GSkill.

Not to my knowledge. And if the corruption happens before the file
goes into the write-buffer, it would not help anyways.

OK, I was thinking of an automatic src against dst file comparison
after each file is written.

Apart from that, your approach is sound. It may just be that
you have (had) more than one source of corruption. PC hardware
has gotten more reliable, but not at the same speed memopry and
disks have gotten larger.

Yes, I am pretty sure that I have several source of corruption because
upgrading the motherboard BIOS and changing the RAM significantly
reduced the occurrence of the problem.

Here are the difference pattern on the 3 corrupted files (out of 355)
I found after my tonight's copy/MD5 check under Windows 7 (Ultimate
x64).
You will notice that I have a difference of 6 bytes out of 722 124 475
046 bytes copied. This is very low but I should have zero differences.
According to the pattern, it seems to be a random problem but this is
always and only 2 different bytes per files.

G:\_CORRUPTED_>dir
Volume in drive G is MYSYS
Volume Serial Number is XXXX-XXXX

Directory of G:\_CORRUPTED_

22/08/2011 21:39 <DIR> .
22/08/2011 21:39 <DIR> ..
04/05/2011 06:28 4 500 370 859 file1
12/03/2011 20:04 7 726 881 613 file2
17/04/2011 08:11 3 723 489 580 file3

C:>fc /b "D:\file1" "G:\_CORRUPTED_\file1"
Comparing files D:\file1 and G:\_CORRUPTED_\file1
86394C03: 17 1F

C:>fc /b "D:\file2" "G:\_CORRUPTED_\file2"
Comparing files D:\file2 and G:\_CORRUPTED_\file2
0000000121BA4C03: 34 3C

C:>fc /b "D:\file3" "G:\_CORRUPTED_\file3"
Comparing files D:\file3 and G:\_CORRUPTED_\file3
BCD42C03: 61 69

I also ran the SFC.EXE Microsoft tool which checks the system
integrity and hopefully, it did not find any problem:

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\>sfc /scannow

Beginning system scan. This process will take some time.

Beginning verification phase of system scan.
Verification 100% complete.

Windows Resource Protection did not find any integrity violations.

Arno · Aug 23, 2011

Anyway, I tested with 2 different brands (GSkill then Corsair) and
still get the problem.
The only thing I can tell is the occurrence is much lower with the
Corsair than with the GSkill.

This is difficult to interpret. It could be that the higher
rate is due to teh GSkill and the lower rate is an unrelated
problem. Or it could be the same problem with lower intensity.

OK, I was thinking of an automatic src against dst file comparison
after each file is written.

Well, you can at least automatize it to some degree, just not
easily on a per-file base.

Yes, I am pretty sure that I have several source of corruption because
upgrading the motherboard BIOS and changing the RAM significantly
reduced the occurrence of the problem.

Here are the difference pattern on the 3 corrupted files (out of 355)
I found after my tonight's copy/MD5 check under Windows 7 (Ultimate
x64).
You will notice that I have a difference of 6 bytes out of 722 124 475
046 bytes copied. This is very low but I should have zero differences.

You should have. I tend to make full-image backups of my Laptops
onto 40GB tar files and the like, with zero differences. There
is no reason to accept any corruption.

According to the pattern, it seems to be a random problem but this is
always and only 2 different bytes per files.

If I see that correctly, it is one different byte. "fc" just states
the byte values in both files. "fc" should output all differences,
so there is really only one.

G:\_CORRUPTED_>dir
Volume in drive G is MYSYS
Volume Serial Number is XXXX-XXXX

Directory of G:\_CORRUPTED_

22/08/2011 21:39 <DIR> .
22/08/2011 21:39 <DIR> ..
04/05/2011 06:28 4 500 370 859 file1
12/03/2011 20:04 7 726 881 613 file2
17/04/2011 08:11 3 723 489 580 file3

C:>fc /b "D:\file1" "G:\_CORRUPTED_\file1"
Comparing files D:\file1 and G:\_CORRUPTED_\file1
86394C03: 17 1F

last part of offset: c03
binary of original: 0001 0111
binary of corrupted: 0001 1111
^

C:>fc /b "D:\file2" "G:\_CORRUPTED_\file2"
Comparing files D:\file2 and G:\_CORRUPTED_\file2
0000000121BA4C03: 34 3C

last part of offset: c03
binary of original: 0011 0100
binary of corrupted: 0011 1100
^

C:>fc /b "D:\file3" "G:\_CORRUPTED_\file3"
Comparing files D:\file3 and G:\_CORRUPTED_\file3
BCD42C03: 61 69

last part of offset: c03
binary of original: 0110 0001
binary of corrupted: 0110 1001
^

This is pretty conclusive. You have a weak bit at an address
......c03 in bit position 3 that sometimes flips to 1
when it should be 0.

Good, Diagnostics is established and the error seems
to be always in the same place.

Very likely a RAM issue. Could be a defective CPU cache
bit as well, although that is less likely. Do you have
CPU ECC for the cahces set to "on" in the BIOS?

A possible reason is also wrong timing settings for
the RAM (BIOS defect). It can drive cells over the
edge that would work perfectly fine with good setings.

If your BIOS has memory timing settings, you can try
to set everything to the slowest setting possible
and try again. (Do not mess with termination settings,
they need to match the number of modules, no "higher
is safer here".) You can also try increasing RAM voltage
by 0.1V or 0.2V (not more).

Of course, it could always be defective RAM again. You
can try running with just one module if you have two.
It seems that this is a single defect, so one module would
be fine and one defective. If however the problem vanishes
with just one module, then it most likely is incorrect
RAM timings or bus termination.

I also ran the SFC.EXE Microsoft tool which checks the system
integrity and hopefully, it did not find any problem:

Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\>sfc /scannow

Beginning system scan. This process will take some time.

Beginning verification phase of system scan.
Verification 100% complete.

Windows Resource Protection did not find any integrity violations.

And that doe snot mean anything if your observerd error rate
is 3 bits in 700GB.

Arno

Castor Nageur · Aug 26, 2011

You should have. I tend to make full-image backups of my Laptops
onto 40GB tar files and the like, with zero differences. There
is no reason to accept any corruption.

Absolutely.
I used to maintaining MD5 files because 10 years ago I had a similar
experience (the occurrence was much higher : 2 consecutive checks on
the same file gave 2 different results).
Without regularly MD5 checking, I probably would have never detected
the problem.
I think a lot of people can have that kind of problem without knowing
it.

This is pretty conclusive. You have a weak bit at an address
.....c03 in bit position 3 that sometimes flips to 1
when it should be 0.

Good, Diagnostics is established and the error seems
to be always in the same place.

Thanks for your excellent and accurate diagnosis.
And my other experiences prove that your are right : I did the copy
test again and again and I always get the error on the same bit.
I installed Ubuntu 11.04 (64 bits) but I got a different behavior :
all the files were OK except one which had 700 MB missing out of 3500
MB so I suspect an unrelated problem.
Anyway, this a low level memory problem OS-independent so I will
continue testing under W7.

In order to isolate the problem, I am trying to set my HD in Read-
Write-Verify mode (disabled by defaut) so I am sure, the drive is OK.
Thanks to a previous post from Franc Zabkar, I got the modified
version of hdparm : I patched the sources then rebuilt them under
Linux and it worked.

For those for are intersted, here is the source link (you have to
patch over 9.37 version than make):

http://www.altechnative.net/?p=140

I am currently installing Cygwin so I can easily build the patched
hdparm under Windows.
I noticed that only only my Seagates' HDD have the RWV flag and not my
WD HDD (perhaps WD calls this feature differently).

Very likely a RAM issue. Could be a defective CPU cache
bit as well, although that is less likely. Do you have
CPU ECC for the cahces set to "on" in the BIOS?

Unfortunately, I have no CPU ECC parameter settings in my BIOS.
I expected the CPU cache to have the ECC activated by default : I am
going to find out if its is activated or not.
Otherwise, I will not trust any computer's hardware anymore ;-)

A possible reason is also wrong timing settings for
the RAM (BIOS defect). It can drive cells over the
edge that would work perfectly fine with good setings.

If your BIOS has memory timing settings, you can try
to set everything to the slowest setting possible
and try again. (Do not mess with termination settings,
they need to match the number of modules, no "higher
is safer here".) You can also try increasing RAM voltage
by 0.1V or 0.2V (not more).

I had 5-5-5-18 and set 6-7-7-25 and still get errors.
I did not try increasing the voltage yet.

Of course, it could always be defective RAM again. You
can try running with just one module if you have two.
It seems that this is a single defect, so one module would
be fine and one defective. If however the problem vanishes
with just one module, then it most likely is incorrect
RAM timings or bus termination.

I will test it but I think there is very few chance I get the same
problem with 2 differents RAM.
I read that RAM issues was very rare or I am very unlucky.
It will take time (each test lasts 3 hours for copying and 1h30 for
verifying).

And that doe snot mean anything if your observerd error rate
is 3 bits in 700GB.

Of course, you are right, I just wanted to be sure my OS was fine
since I do not want to get some other side-effects.

Franc Zabkar · Aug 26, 2011

And my other experiences prove that your are right : I did the copy
test again and again and I always get the error on the same bit.

Could you create a RAM drive and copy files between folders on this
drive?

Another idea that may help to induce a failure in bit #3 and address
0xnnnnC03 would be to create a file consisting of all 1 bits (0xFF)
with the exception of bit #3 at addresses 0x00000C03, 0x00002C03,
0x00004C03, 0x00006C03, etc. Bit #3 should have a value of 0 at these
addresses.

This setup may induce crosstalk between the solitary 0 bit and its
surrounding 1 bits.

- Franc Zabkar

Castor Nageur · Aug 27, 2011

Another idea that may help to induce a failure in bit #3 and address
0xnnnnC03 would be to create a file consisting of all 1 bits (0xFF)
with the exception of bit #3 at addresses 0x00000C03, 0x00002C03,
0x00004C03, 0x00006C03, etc. Bit #3 should have a value of 0 at these
addresses.

This setup may induce crosstalk between the solitary 0 bit and its
surrounding 1 bits.

Thanks for the trick, Franc.
In facts, I decided to run MemTest86+ last night and finally found the
faulty bit.
I already ran a test on the Corsair RAM but using the MemTest tool
which run under Windows ... and it was not enough.

Castor Nageur · Aug 27, 2011

Of course, it could always be defective RAM again.

Yes it is !

http://www.cijoint.fr/cj201108/cijeQBWCfV.jpg

I let MemTest86+ run all night long and it found the error at the 3rd
pass. It seems that it exactly matches the error we observed during
the files' copy test. MemTest86+ did not find anything on my GSkill
RAM with 5 passes so I think it was a different problem (probably a
mobo/RAM compatibility problem as I first thought).

I am very unlucky with RAM : the people on the forums seems to tell
this kind of problem happens very rarely.

Anyway, this convinced me to build a new computer based on ECC RAM.

Arno · Aug 27, 2011

Thanks for the trick, Franc.
In facts, I decided to run MemTest86+ last night and finally found the
faulty bit.
I already ran a test on the Corsair RAM but using the MemTest tool
which run under Windows ... and it was not enough.

Veru good!

Arno

Arno · Aug 27, 2011

Yes it is !

I let MemTest86+ run all night long and it found the error at the 3rd
pass. It seems that it exactly matches the error we observed during
the files' copy test. MemTest86+ did not find anything on my GSkill
RAM with 5 passes so I think it was a different problem (probably a
mobo/RAM compatibility problem as I first thought).

I am very unlucky with RAM : the people on the forums seems to tell
this kind of problem happens very rarely.

I expect it happens relatively often but most people do not notice.
I do not have any global statistics, but I once built a small
Linux cluster for research and one of 44 Infinion modules had
a weak bit that took about 2 days of memtest86+ but only about
5 hours of scientific computation to show up.

Anyway, this convinced me to build a new computer based on ECC RAM.

Kingston has pretty good quality in their Value series for ECC
and non ECC.

ECC will also help because the modules are bound to be
better tested. After all with ECC RAM the likelyhood to the
customer noticing a weak bit is very, very high ...

Arno

Arno · Aug 27, 2011

Absolutely.
I used to maintaining MD5 files because 10 years ago I had a similar
experience (the occurrence was much higher : 2 consecutive checks on
the same file gave 2 different results).
Without regularly MD5 checking, I probably would have never detected
the problem.
I think a lot of people can have that kind of problem without knowing
it.

Would not surprise me at all. My advice is to do checksums
or verifies against the original data on any large data copy
or move and certainly on backups.

Thanks for your excellent and accurate diagnosis.

You are welcome.

And my other experiences prove that your are right : I did the copy
test again and again and I always get the error on the same bit.
I installed Ubuntu 11.04 (64 bits) but I got a different behavior :
all the files were OK except one which had 700 MB missing out of 3500
MB so I suspect an unrelated problem.
Anyway, this a low level memory problem OS-independent so I will
continue testing under W7.

In order to isolate the problem, I am trying to set my HD in Read-
Write-Verify mode (disabled by defaut) so I am sure, the drive is OK.
Thanks to a previous post from Franc Zabkar, I got the modified
version of hdparm : I patched the sources then rebuilt them under
Linux and it worked.

For those for are intersted, here is the source link (you have to
patch over 9.37 version than make):

http://www.altechnative.net/?p=140

Interesting. I was not aware HDDs could still do that.

I am currently installing Cygwin so I can easily build the patched
hdparm under Windows.
I noticed that only only my Seagates' HDD have the RWV flag and not my
WD HDD (perhaps WD calls this feature differently).

Ah, so it is not a thing required.

Unfortunately, I have no CPU ECC parameter settings in my BIOS.
I expected the CPU cache to have the ECC activated by default : I am
going to find out if its is activated or not.
Otherwise, I will not trust any computer's hardware anymore ;-)

I had 5-5-5-18 and set 6-7-7-25 and still get errors.
I did not try increasing the voltage yet.

I will test it but I think there is very few chance I get the same
problem with 2 differents RAM.
I read that RAM issues was very rare or I am very unlucky.
It will take time (each test lasts 3 hours for copying and 1h30 for
verifying).

As I said in my other answers to your later posts, I do not
belive weak bits in RAM are rare. I would expect something like
1 in 20...50 per module. That is a 2...5% defective-out-of-the-box
rate and perfectly in line with the PC industries quality standards.

Being hit twice is not unlikely with this, especially if you
atke into account that you usually use more than one module.
And most pople will still not notice.

Of course, you are right, I just wanted to be sure my OS was fine
since I do not want to get some other side-effects.

Indeed.

Arno

Castor Nageur · Aug 27, 2011

In facts, it is not finished yet : I am unable to find the faulty
module from the MemTest86+ reported address. I did not find any help
in my mobo manual and people say that the use of the memory modules
depends on the motherboard implementation.

They also say that the DIMM slot can also be faulty so I first have to
find a good DIMM then try it on all the slots and finally check all
the DIMMs independently. I still have hours of testing !

My source: http://www.overclockers.com/forums/showthread.php?t=409152

Kingston has pretty good quality in their Value series for ECC
and non ECC.

Yes, I will buy Kingston ECC.

ECC will also help because the modules are bound to be
better tested. After all with ECC RAM the likelyhood to the
customer noticing a weak bit is very, very high ...

I realized that finding ECC RAM is not a problem.
Finding a CPU handling ECC is also fine : I plan to buy the Xeon
E3-1275 (I do not want AMD because Intel CPU seems to actually go
faster).
The problem is finding the motherboard that correctly support ECC : I
plan to buy the Tyan S5512GM2NR but this is quite impossible to find
in France.

Arno · Aug 28, 2011

Castor Nageur said:
On 27 ao?t, 13:27, Arno <[email protected]> wrote:

In facts, it is not finished yet : I am unable to find the faulty
module from the MemTest86+ reported address. I did not find any help
in my mobo manual and people say that the use of the memory modules
depends on the motherboard implementation.

They also say that the DIMM slot can also be faulty so I first have to
find a good DIMM then try it on all the slots and finally check all
the DIMMs independently. I still have hours of testing !

Very, yeru unlikely. In fact almost impossible without CPU
damage. It is just that slots farther away from the CPU
have a bith higher requirements on DRAM signalling.

As to identifying the module: Use bisection. Run half
of the modules, see if faulty. If yes, run half of that,
otherwise test the other half. Note that 2 modules put
le strain on the bus than 4 and 1 puts less strain on it
than 2. If you run into that (both halves run fine, but
the full exhibits the fault), you need to try with swaps:

If these are the modulse: 1 2 3 4

First swap: 2 1 3 4 If error moves, it is in 1 or 2
Second swap: 1 2 4 3 If error moves, it is in 3 or 4.
Last swap for error in 1,2: 3 2 1 4 If error moves then it is in 1 else 2
Last swap for error in 3,4: 1 4 3 2 If error moves then it is in 4 else 3

No good module required, just some patience.

My source: http://www.overclockers.com/forums/showthread.php?t=409152

Arno

larry moe 'n curly · Aug 29, 2011

Castor said:
I let MemTest86+ run all night long and it found the error at the 3rd
pass. It seems that it exactly matches the error we observed during
the files' copy test. MemTest86+ did not find anything on my GSkill
RAM with 5 passes so I think it was a different problem (probably a
mobo/RAM compatibility problem as I first thought).

I am very unlucky with RAM : the people on the forums seems to tell
this kind of problem happens very rarely.

Anyway, this convinced me to build a new computer based on ECC RAM.

I also run Gold Memory because it sometimes finds errors that
MemTest86 and MemTest86+ miss.

I try to stick with memory modules made with major brand chips
(Samsung/SEC, Nanya/Inotera, Hynix, Micron, ProMOS, PowerChip,
Winbond) whose identities can easily be read (i.e., no heatsinks). A
lot of other modules are made with either factory reject chips (UTT --
UnTesTed) or chips that shipped out as whole wafers that get sliced up
and tested by other companies that never seem to use the same testing
machines that chip companies do (the kind that list on Ebay for
$250,000, used) but just regular motherboards or a lame tester from
CST. It turns out there are even memory testing farms that just run
Gold Memory or MemTest86/86+ on regular motherboards.

Castor Nageur · Aug 29, 2011

If these are the modulse: 1 2 3 4

First swap: 2 1 3 4 If error moves, it is in 1 or 2
Second swap: 1 2 4 3 If error moves, it is in 3 or 4.
Last swap for error in 1,2: 3 2 1 4 If error moves then it is in 1 else 2
Last swap for error in 3,4: 1 4 3 2 If error moves then it is in 4 else 3

No good module required, just some patience.

Thanks again for your excellent advices.
I will do this and post the results here in a couple of weeks.
I now understand the benefit of having a professional RAM tester : you
gain a lot of time with it.

Castor Nageur · Aug 29, 2011

I also run Gold Memory because it sometimes finds errors that
MemTest86 and MemTest86+ miss.

I know MemTest86+ but not Gold Memory so thanks for this, I will give
it a try.
That's a pity the RAM manufacturers do not provide their specific RAM
tester softwares like hard disk manufacturers do. Because they know
their product better than anyone else, they should target a problem
faster than the common tester tools.

I try to stick with memory modules made with major brand chips
(Samsung/SEC, Nanya/Inotera, Hynix, Micron, ProMOS, PowerChip,
Winbond) whose identities can easily be read (i.e., no heatsinks). A
lot of other modules are made with either factory reject chips (UTT --
UnTesTed)

That's very interesting.
There is a big heatsink surrounding my Corsair DIMMs and it was the
same for the GSkill I had just before I returned them back to shop. On
each of them, there is a big sticker with the brand and RAM type. I
used to trusting this but it is true you can not directly read the
electronic's chip text.
I thought the only aim of the heatsink was heat dissipation not hiding
the chip information but it makes sense.

or chips that shipped out as whole wafers that get sliced up
and tested by other companies that never seem to use the same testing
machines that chip companies do (the kind that list on Ebay for
$250,000, used) but just regular motherboards or a lame tester from
CST. It turns out there are even memory testing farms that just run
Gold Memory or MemTest86/86+ on regular motherboards.

That's why I will buy Kingston DIMMs because I now know they are
serious and that they test their RAM correctly.

Bob Willard · Aug 29, 2011

I know MemTest86+ but not Gold Memory so thanks for this, I will give
it a try.
That's a pity the RAM manufacturers do not provide their specific RAM
tester softwares like hard disk manufacturers do. Because they know
their product better than anyone else, they should target a problem
faster than the common tester tools.

I doubt if the test software that RAM makers use runs on standard PCs,
since PCs likely can't generate worst-case test patterns on RAMs while
executing out of those same RAMs.

larry moe 'n curly · Aug 30, 2011

Castor said:
I know MemTest86+ but not Gold Memory so thanks for this, I will give
it a try.

I learned of Gold Memory from a decade-old review (the newest RAM
diagnostic review I can find on the web, at least in English):

part 1: www.realworldtech.com/page.cfm?ArticleID=RWT052001232443
part 2: www.realworldtech.com/page.cfm?ArticleID=RWT120901222920

It was one of the few diagnostics to beat MemTest86 (which the
reviewers also liked a lot), and only the expensive PHD RST products
did better, finding all errors. However I've gotten different results
from Gold Memory 5.07 and the newest shareware version, 6.92, and also
from MemTest86 and MemTest86+. For some reason, neither MemTest86+
nor Gold Memory 6.92 has ever detected an error in any of my memory if
the memory passed the motherboard's slow boot-up test. Unfortunately
Gold Memory 5.07 is limited to 4GB.

That's a pity the RAM manufacturers do not provide their specific RAM
tester softwares like hard disk manufacturers do. Because they know
their product better than anyone else, they should target a problem
faster than the common tester tools.

Crucial used to feature MemTest86 at their website, and I think OCZ
offered a diagnostic and even an SPD editor. I have a feeling the
other companies don't offer anything because it would show how bad
their memory is. I wish motherboards came with a diagnostic so I
could see what's wrong without first installing the operating system
and drivers.

That's very interesting.
There is a big heatsink surrounding my Corsair DIMMs and it was the
same for the GSkill I had just before I returned them back to shop. On
each of them, there is a big sticker with the brand and RAM type. I
used to trusting this but it is true you can not directly read the
electronic's chip text.
I thought the only aim of the heatsink was heat dissipation not hiding
the chip information but it makes sense.

I think high quality RAM chips cost would more than flashy looking
heatsinks, but heatsinks are unnecessary because.each RAM chip
dissipates only one watt, worst case. AFAIK, the only memory modules
made with first-rate chips that come with heatsinks are old RAMBUS
modules and some Samsung DDR3 modules.

That's why I will buy Kingston DIMMs because I now know they are
serious and that they test their RAM correctly.

Kingston is one of those companies that usually buys whole wafers and
then slices them up on their own, so I think they'd know the
characteristics of their chips and didn't just estimate them, as many
module makers do. However they seem to rate their chips close to
their limits, meaning they're not the most overclockable (I don't
overclock, except for testing), and years ago, when
DDR was the fastest memory, 10-20% of my Kingstons were bad, with a
whopping ~65% of my 512MB PC3200 failing. OTOH if you get a bad
Kingston and talk to their higher level tech support and mention how
you've tested with 2-3 different motherboards, they send you back
memory that always works.

larry moe 'n curly · Aug 30, 2011

Bob said:
I doubt if the test software that RAM makers use runs on standard PCs,
since PCs likely can't generate worst-case test patterns on RAMs while
executing out of those same RAMs.

If you Google a brand of memory and the words "factory tour", you'll
likely find memory modules being tested using nothing but ordinary
motherboards, which could explain why so many defective modules are
shipped. Some companies, like Corsair, have PHD RST testing cards
plugged into the mobos, and RST diagnostics are supposed to be the
best, but I haven't had good luck with Corsair in the last 2-3 years.
I saw first-rate test equipment being used at only at a KingMax
factory, where a photo showed an Advantest machine. There are also
memory testing farms, and a photo of one showed them using Gold
Memory.

Rod Speed · Aug 30, 2011

larry moe 'n curly wrote

Castor Nageur wrote

I learned of Gold Memory from a decade-old review (the newest RAM
diagnostic review I can find on the web, at least in English):

part 1: www.realworldtech.com/page.cfm?ArticleID=RWT052001232443
part 2: www.realworldtech.com/page.cfm?ArticleID=RWT120901222920

It was one of the few diagnostics to beat MemTest86 (which the
reviewers also liked a lot), and only the expensive PHD RST products
did better, finding all errors. However I've gotten different results
from Gold Memory 5.07 and the newest shareware version, 6.92, and also
from MemTest86 and MemTest86+. For some reason, neither MemTest86+
nor Gold Memory 6.92 has ever detected an error in any of my memory if
the memory passed the motherboard's slow boot-up test. Unfortunately
Gold Memory 5.07 is limited to 4GB.

Crucial used to feature MemTest86 at their website, and I think OCZ
offered a diagnostic and even an SPD editor. I have a feeling the
other companies don't offer anything because it would show how bad
their memory is. I wish motherboards came with a diagnostic so I
could see what's wrong without first installing the operating system
and drivers.

Quite a few of the live CDs come with a memory diagnostic.

Files corruption after SATA2 disk to disk transfer	4	Jun 13, 2011
External USB harddisk problem is driving me crazy	18	Nov 9, 2011
Work laptop issues	4	Dec 8, 2019
Sata and Data Corruption	27	Apr 22, 2004
My fresh install of Mint 16 - Steamed up	11	May 19, 2014
Memory use and performance issues	16	Oct 18, 2008
What doesn't PM8 like my partition? Errors 107 and 108	1	Sep 24, 2011
775 Cooler, CPU, Hard Disks, Memory	3	Sep 22, 2011

My SATA transfer corruption issue is back !

Castor Nageur

Arno

helloworld

Arno

Castor Nageur

Franc Zabkar

Castor Nageur

Castor Nageur

Arno

Arno

Arno

Castor Nageur

Arno

larry moe 'n curly

Castor Nageur

Castor Nageur

Bob Willard

larry moe 'n curly

larry moe 'n curly

Rod Speed

Ask a Question

Similar Threads