Questions about DDR RAM

  • Thread starter Thread starter Igor
  • Start date Start date
It has additional bits used to store checksum information so that
errors can be detected and simple ones corrected. In short, it's a
reliability thing. Without knowing much about your set up, it's
impossible to be sure but at three or four times the cost of normal
RAM I doubt it will be a cost-effective way of improving
reliability.

ECC just adds one extra RAM, ie 9 chips versus 8, or 72 bits versus
64. Even allowing for economies of scale, I can't see why you would
expect a 3x or 4x price difference.

Kingston 1GB 333MHz DDR ECC Registered CL2.5 DIMM Dual Rank, x8
(US$70):
http://www.ec.kingston.com/ecom/con....asp?root=&LinkBack=&ktcpartno=KVR333D8R25/1G

Kingston 1GB 333MHz DDR Non-ECC CL2.5 DIMM (US$54):
http://www.ec.kingston.com/ecom/con...asp?root=&LinkBack=&ktcpartno=KVR333X64C25/1G
If you want to do that on most systems, you're better
off spending the money on things like fitting a UPS and RAID storage -
these cover much less reliable elements of the system.

Memory is fairly reliable provided you aren't recklessly overclocking
things so I'd consider ECC inappropriate on all but the most mission-
critical systems. Unless you are talking about a machine already
fitted out with a UPS, hot-swappable RAID, redundant power supplies
and preferably an secure, climate controlled machine room to put it
all in there are more important risk factors to consider.

- Franc Zabkar
 
kony said:
Because they don't particularly care if a customer's
calculations/etc end up wrong if it's not guaranteed for
some critical use. The industry didn't actually abandon it
for critical uses.

In fact, for critical usage some companies have gone far beyond normal
SEC/DED error correction in servers.
I suppose I ought to trim the crossposting...
 
In fact, for critical usage some companies have gone far beyond normal
SEC/DED error correction in servers.

If you are referring to ECC schemes and not memory mirroring, I'd be
interested in examples that *don't* simply use n Hamming codewords to turn
n-bit-wide full chip failures into correctable events - which really wasn't
that remarkable, revolutionary or heroic when it was first employed - about 20
years ago...
I suppose I ought to trim the crossposting...

And what fun would that be?

/daytripper ;-)
 
In alt.comp.hardware.pc-homebuilt daytripper
If you are referring to ECC schemes and not memory mirroring, I'd be
interested in examples that *don't* simply use n Hamming codewords to turn
n-bit-wide full chip failures into correctable events - which really wasn't
that remarkable, revolutionary or heroic when it was first employed - about 20
years ago...
It's just a hamming-code.
However, about 99% of memory installed on PCs is *not* EEC or even
parity enabled. They all *should* be.
 
daytripper said:
If you are referring to ECC schemes and not memory mirroring, I'd be
interested in examples that *don't* simply use n Hamming codewords to
turn
n-bit-wide full chip failures into correctable events - which really
wasn't
that remarkable, revolutionary or heroic when it was first employed -
about 20
years ago...


And what fun would that be?

/daytripper ;-)

In some sense it is a hamming code or along those lines. but it includes
redundant bit steering, correction of double errors when one is hard and
the other soft, package codes, scrubbing, etc. The whole quote was "far
beyond normal SEC/DED...." eg 64/72 type codes that do single bit
correction and double bit detection.

del
 
In comp.sys.ibm.pc.hardware.chips Frank McCoy said:
However, about 99% of memory installed on PCs is *not*
EEC or even parity enabled. They all *should* be.

Oh, pray tell, why? Do you believe you know the PC
business better than Intel, AMD, Dell, HP, ... who have
decided to manufacture chipsets and computers without ECC?

Do you believe ~50 US$/box is better spent on ECC than on
improved capacitors, mobo layers, cabling, cooling or shielding?

There are always many improvements possible. The key is to
choose the best ones. Not fixate like a kid in a candy store.

-- Robert
 
In alt.comp.hardware.pc-homebuilt Robert Redelmeier
Oh, pray tell, why? Do you believe you know the PC
business better than Intel, AMD, Dell, HP, ... who have
decided to manufacture chipsets and computers without ECC?

Do you believe ~50 US$/box is better spent on ECC than on
improved capacitors, mobo layers, cabling, cooling or shielding?
It shouldn't add more than 10% to the price of memory; which would be
about 2% or less of the price of the computer itself.

The problem isn't Intel or anybody else with the possible exception of
IBM; but there only slightly.

The problem is custom and history.
They didn't do it in the past, for fairly good and decent reasons.
They don't do it *now* because they didn't do it in the past.
That is NOT a good reason.
There are always many improvements possible. The key is to
choose the best ones. Not fixate like a kid in a candy store.
The problem is:
WITH ECC built in, probably over half the cases of "Blue Screen of
Death" or computer crashes and foulups *could* be things of the past!

Even in cases where things like poor capacitors cause spikes, having ECC
memory in the machine would obviate a large portion of those problems.

The original reasons of the extra logic and extra expense just ARE NOT
that relevant any more. They shouldn't even SELL non-ECC memory, for
the relatively tiny price-differential versus the HUGE difference in
reliability. It's like selling retread tires as new ones for almost the
same price. Sure they're CHEAPER ... marginally.

The worst part is, people could actually be KILLED by such mistakes made
by a computer that might have been corrected with ECC ... Yet nobody
will trace it back to that; just: "Sorry, the computer crashed!"

That's unlike a bad tire, which eventually *will* get noticed after
enough people die.

Worse-yet, people aren't even being educated as to what the difference
is. Essentially they're told and even believe that non-ECC memory is
just as good, only cheaper.

"I've ran my computer for years without ECC; and it ran just FINE!"
Only that ignores the freezups, crashes, blue-screens, and other crap
that got attributed to software instead of memory failures. ;-{

These days people seem to *expect* such failures, when 99.99% of the
ones caused by bad memory (probably well over half) could be fixed.

Most people ass-u-me that their memory is good; never EVER running a
memory-test other than the completely useless crap on boot. Hell, most
people, if a computer is crapping out, just replace the whole thing.

In fact, many computer-repair places *encourage* their customers to do
just that ... It makes more money for the company; while running a good
memory-test takes up very valuable technician time and space in the
repair-shop.

For a mere pittance in extra cost these days, especially if ECC memory
was the *standard* instead of the rarely-used, the "extra cost" would be
a huge monetary *gain* instead of a loss. Most especially so in
customer satisfaction.

Still, they don't count "customer satisfaction" as worth a dime these
days, not in comparison to saving ten or twenty cents on a part, do
they?
 
Oh, pray tell, why? Do you believe you know the PC
business better than Intel, AMD, Dell, HP, ... who have
decided to manufacture chipsets and computers without ECC?

They sell ECC equipped systems too. Would you believe you
knew better if you chose standardized parts instead of
proprietary motherboards, cases and PSU? Most agree on
that. The idea of blindly following an OEM is contrary to
our goals.
 
In comp.sys.ibm.pc.hardware.chips Frank McCoy said:
It shouldn't add more than 10% to the price of memory;

No, unless used for yield improvement, std 64/72 ECC must cost
at least 12.5% more for components. Due to lower market
volume, it actually costs 30-50% more
The problem isn't Intel or anybody else with the possible
exception of IBM; but there only slightly.

Why do you doubt their design choice to omit ECC?
The problem is custom and history. They didn't do it in
the past, for fairly good and decent reasons. They don't
do it *now* because they didn't do it in the past. That is
NOT a good reason.

Wrong. From the 8088 thru 486, almost all PCs -- IBM and
clones had parity memory. Macs did not. Only with the
Pentium SIMMs did parity morphed to ECC and begin to drop.
The problem is: WITH ECC built in, probably over half the
cases of "Blue Screen of Death" or computer crashes and
foulups *could* be things of the past!

Reference please! BSoD can have many causes.
I suspect software and software patches mostly.
I keep Linux machines up for ~1 yr w/o ECC.
Even in cases where things like poor capacitors cause
spikes, having ECC memory in the machine would obviate a
large portion of those problems.

No, because spikes often hit the busses in parallel.
The original reasons of the extra logic and extra expense
just ARE NOT that relevant any more. They shouldn't
even SELL non-ECC memory, for the relatively tiny
price-differential versus the HUGE difference in reliability.
It's like selling retread tires as new ones for almost the
same price. Sure they're CHEAPER ... marginally.

Again, you presume you know better than Intel, AMD,
Dell, HP, etc.
The worst part is, people could actually be KILLED by such
mistakes made by a computer that might have been corrected
with ECC ... Yet nobody will trace it back to that; just:
"Sorry, the computer crashed!"

Life critical computing and control machinery does
not run on PCs or with MS software.
"I've ran my computer for years without ECC; and it ran
just FINE!" Only that ignores the freezups, crashes,
blue-screens, and other crap that got attributed to software
instead of memory failures. ;-{

Except I've run several just fine without anything
resembling BSoDs with uptimes around a year.
These days people seem to *expect* such failures, when 99.99%
of the ones caused by bad memory (probably well over half)
could be fixed.

Reference please!
Most people ass-u-me that their memory is good; never EVER
running a memory-test other than the completely useless crap
on boot. Hell, most people, if a computer is crapping out,
just replace the whole thing.

Perhaps this is true for most, but I've run intense software
memory testers like memtest-86+ for days and weeks yet never
seen an inexplicable error.

-- Robert
 
If you are referring to ECC schemes and not memory mirroring, I'd be
interested in examples that *don't* simply use n Hamming codewords to turn
n-bit-wide full chip failures into correctable events - which really wasn't
that remarkable, revolutionary or heroic when it was first employed - about 20
years ago...

I think you'll find it's been longer than 20 years. More likely
double that.
And what fun would that be?

Beat me to it.
 
In alt.comp.hardware.pc-homebuilt Robert Redelmeier
No, unless used for yield improvement, std 64/72 ECC must cost
at least 12.5% more for components. Due to lower market
volume, it actually costs 30-50% more
I said it *shouldn't* cost more; not that it doesn't.
"yield improvements" or not should count for only a few
percentage-points in the total cost. The primary cost of memory these
days is advertising, shipping, packaging, storage, and promotion, NOT
the amount of silicon used.
Why do you doubt their design choice to omit ECC?
Because, like everybody else, they're price conscious to the point where
a few cents in millions of units is big bucks to them; and with nobody
pointing out (and very few knowing) the advantages of ECC, why should
they promote it? It's not in their best-interest to do so.
Wrong. From the 8088 thru 486, almost all PCs -- IBM and
clones had parity memory. Macs did not. Only with the
Pentium SIMMs did parity morphed to ECC and begin to drop.
Wrong again.
Most clones had parity *capability*.
Almost none had actual parity memory *installed*.
I know ... I have over a dozen out in the garage; and about four times
that in obsolete memory sticks for all of them; not one of which is
parity-memory. It got so bad you almost couldn't *buy* parity-memory.

Actually, with good reason:
If you put parity-memory in a computer, that actually made it far *more*
likely to FAIL! Why? Because all the computer could do is yell and
scream, "PARITY ERROR!" and crash!

Often, for that very reason, even computers *with* parity memory had it
disabled in the BIOS. Parity-memory being less than useless; unlike ECC
memory which *corrects* errors.
Reference please! BSoD can have many causes.
I suspect software and software patches mostly.
I keep Linux machines up for ~1 yr w/o ECC.


No, because spikes often hit the busses in parallel.
But the errors they *cause* are usually memory-errors.
Memory being *far* more susceptible to such; and with far less margin.
Again, you presume you know better than Intel, AMD,
Dell, HP, etc.


Life critical computing and control machinery does
not run on PCs or with MS software.
Like hell!
Except I've run several just fine without anything
resembling BSoDs with uptimes around a year.
Pardon my French; but you sound ALL too much like the guy saying,
"Nobody needs anti-virus software! I've ran for *years* now without
any; and *I* don't have any problems!"
Reference please!
My own; from maintaining many computers.
You run memory-tests on those failing computers, and likely over 50% of
the time, if you run it long enough, you'll find a failing memory-stick!
Perhaps this is true for most, but I've run intense software
memory testers like memtest-86+ for days and weeks yet never
seen an inexplicable error.
Well, that just explains why *you personally* haven't had the problems I
mention. Not everybody is lucky enough to get perfect sticks; and of
those who don't, most never even suspect. But then, that's what the
manufacturers *expect*. How many people *do* run memtest86+ on their
computers, even among those failing every few days?

How many even *suspect* that a memory problem might be the root of their
troubles; especially with people like you insisting there are no such
problems?

Like I said, you sound like the guy insisting there's no need for
anti-virus software because *he* has never seen such a problem.

Yeah, right.
 
In alt.comp.hardware.pc-homebuilt daytripper
I'll let you defend that statement with a cite :-) I'm sticking with the
timeframe being the very early '80's, when Digital started shipping systems
using x4 drams in volume and needed to survive a full chip failure...
ECC memory was used back in the core days, when "core" meant just that.
I remember ECC memory being available for S-100 bus machines; and it was
*old* technology even then.

You've always had to pay extra for it though.
The usual price/performance ratio is about 50% extra; and that hasn't
changed much over the many years; even though it should have with
today's prices on commodity things like memory set more by distribution
than complexity. That's why it doesn't cost nearly twice as much for
1-gig chips as it does for 500-meg chips these days.
 
In alt.comp.hardware.pc-homebuilt daytripper

ECC memory was used back in the core days, when "core" meant just that.
I remember ECC memory being available for S-100 bus machines; and it was
*old* technology even then.

You've always had to pay extra for it though.
The usual price/performance ratio is about 50% extra; and that hasn't
changed much over the many years; even though it should have with
today's prices on commodity things like memory set more by distribution
than complexity. That's why it doesn't cost nearly twice as much for
1-gig chips as it does for 500-meg chips these days.

If you're going to play out game, do try to keep up: Keith and I were
discussing (ie: "trying to remember" for us old pharts ;-) when bit-scattering
over multi-ECC-codeword schemes were implemented in memory systems.

Not when ECC first appeared. Sheesh...

/daytripper (usenet has a very shallow memory indeed ;-)
 
I think you'll find it's been longer than 20 years. More likely
double that.

I'll let you defend that statement with a cite :-) I'm sticking with the
timeframe being the very early '80's, when Digital started shipping systems
using x4 drams in volume and needed to survive a full chip failure...
Beat me to it.

Bad habit of mine ;-)

/daytripper (meanwhile, I'm gonna go dangle flies in front of steelhead :-)
 
In comp.sys.ibm.pc.hardware.chips Frank McCoy said:
I said it *shouldn't* cost more; not that it doesn't.
"yield improvements" or not should count for only a few
percentage-points in the total cost.

Probably more if it were aggressively done. I think it quite
possible the ECC on CPU L2 caches is for yield improvement
as much as data reliability. I'm even more convinced the
ECC on hard-disks is for density increases.
The primary cost of memory these days is advertising,
shipping, packaging, storage, and promotion, NOT the amount
of silicon used.

Huh? Check the price of chips against the price of built
DIMMs. Chips are over 2/3rds the cost, leaving very little.
Because, like everybody else, they're price conscious to
the point where a few cents in millions of units is big
bucks to them; and with nobody pointing out (and very few
knowing) the advantages of ECC, why should they promote it?
It's not in their best-interest to do so.

A very negative and limited view of business. If they could
produce, prove and advertise more reliable machines, wouldn't
that be worth quite some premium? People really _hate_ when
machines crash. And for good reason, since they often lose
considerable work. There's a lot of money out there waiting
for any mfr. They know it, and would go for it.
Wrong again. Most clones had parity *capability*.
Almost none had actual parity memory *installed*. I know
... I have over a dozen out in the garage; and about four
times that in obsolete memory sticks for all of them; not
one of which is parity-memory. It got so bad you almost
couldn't *buy* parity-memory.

I cannot possibly know your experience, but I've had many
motherboards. All the hundreds of dollars of 40 pin SIMM
memory I bought is 9-chip because the mobos would _NOT_
function with 8 chip.
Actually, with good reason: If you put parity-memory in a
computer, that actually made it far *more* likely to FAIL!
Why? Because all the computer could do is yell and scream,
"PARITY ERROR!" and crash!

This was a design decision made by IBM. They considered
a crash better than corrupted data. I agree.
But the errors they *cause* are usually memory-errors.
Memory being *far* more susceptible to such; and with far
less margin.

Actually, I see more errors originating from the hard-disk
cabling and memory busses than errors on memory cells.
Like hell!

Oh? What examples do you have? I do work with such systems
and haven't seen any beyond the occasional MS-Windows based
terminal. And even that was certified and locked.
Pardon my French; but you sound ALL too much like the guy
saying, "Nobody needs anti-virus software! I've ran for
*years* now without any; and *I* don't have any problems!"

Well of course absence of proof isn't proof of absence, but
you have not provided any corroboration of your assertion that
memory failures are the main cause of BSoD. All my experience
can add is that my BSoD-equivalents are well below what most
people see. I don't buy premium memory (I test intensively
and extensively) and the number of machines I've run make it
unlikely to be pure luck.
My own; from maintaining many computers. You run
memory-tests on those failing computers, and likely over
50% of the time, if you run it long enough, you'll find a
failing memory-stick!

Interesting. Do you run a PC repair business? What service
have those machines seen? How many tests? I've tested over
50 sticks (mostly shortly after purchase, some after 8+years)
12-170 hrs and only had two failures, both new. Apart from
total failures, I've seen about 10 HDs that would throw the
occasional error. Sometimes they wouldn't with a better PSU.
Well, that just explains why *you personally* haven't had
the problems I mention. Not everybody is lucky enough to
get perfect sticks; and of those who don't, most never even
suspect. But then, that's what the manufacturers *expect*.
How many people *do* run memtest86+ on their computers,
even among those failing every few days?

When I've seen machines fail, it usually has been the HD.
Easy to prove.
Like I said, you sound like the guy insisting there's no
need for anti-virus software because *he* has never seen
such a problem. > Yeah, right.

Not quite. The "guilt-by-presumed-association" aside, I agree
there is a problem with worms and trojans. Mostly a usage and
configuration problem and largely solveable with privilege
isolation and other measures in the NIST registry entries.
Conventional Anti-virus software is a very poor second. It can
only react to malware it can recognize and find. Which is
none of the new ones, and few of those designed to fly below
the radar. AV software is a cure, but prevention is better.

-- Robert
 
Robert Redelmeier said:
Probably more if it were aggressively done. I think it quite
possible the ECC on CPU L2 caches is for yield improvement
as much as data reliability. I'm even more convinced the
ECC on hard-disks is for density increases.


Huh? Check the price of chips against the price of built
DIMMs. Chips are over 2/3rds the cost, leaving very little.


A very negative and limited view of business. If they could
produce, prove and advertise more reliable machines, wouldn't
that be worth quite some premium? People really _hate_ when
machines crash. And for good reason, since they often lose
considerable work. There's a lot of money out there waiting
for any mfr. They know it, and would go for it.

History would show that bad cheap drives out good. I give
you microchannel vrs ISA as an example. In fact the
whole consumer PC market is an example. With small margins,
and no evidence that people walking in walmart or best buy
have any interest in paying a premium for some nebulous reliability
claim why should manufacturers waste perfectly good bits.

Servers are a different story.
I cannot possibly know your experience, but I've had many
motherboards. All the hundreds of dollars of 40 pin SIMM
memory I bought is 9-chip because the mobos would _NOT_
function with 8 chip.


This was a design decision made by IBM. They considered
a crash better than corrupted data. I agree.
Actually, how was windows supposed to recover from parity error?
IBM didn't write windows.
Actually, I see more errors originating from the hard-disk
cabling and memory busses than errors on memory cells.


Oh? What examples do you have? I do work with such systems
and haven't seen any beyond the occasional MS-Windows based
terminal. And even that was certified and locked.



Well of course absence of proof isn't proof of absence, but
you have not provided any corroboration of your assertion that
memory failures are the main cause of BSoD. All my experience
can add is that my BSoD-equivalents are well below what most
people see. I don't buy premium memory (I test intensively
and extensively) and the number of machines I've run make it
unlikely to be pure luck.

Last time I talked to my buddy that tracks failures, it was software
first, then disks, then electronics.
A little research and a few calculations will tell you how often there
will be a memory error.

How seriously you take it depends on how you feel about errors and
especially undetected errors.
 
daytripper said:
If you're going to play out game, do try to keep up: Keith and I were
discussing (ie: "trying to remember" for us old pharts ;-) when
bit-scattering
over multi-ECC-codeword schemes were implemented in memory systems.

Not when ECC first appeared. Sheesh...

/daytripper (usenet has a very shallow memory indeed ;-)
I am trying to remember when "package codes" came along, well after the
schemes to deal with x4 by scattering the bits.

Here is a good survey paper from 1984
http://www.research.ibm.com/journal/rd/282/chen.pdf
 
In alt.comp.hardware.pc-homebuilt "Del Cecchi"
History would show that bad cheap drives out good. I give
you microchannel vrs ISA as an example. In fact the
whole consumer PC market is an example. With small margins,
and no evidence that people walking in walmart or best buy
have any interest in paying a premium for some nebulous reliability
claim why should manufacturers waste perfectly good bits.
Exactly.

Servers are a different story.

Which is why most ECC memory is marketed "for servers".
 
In comp.sys.ibm.pc.hardware.chips Del Cecchi said:
History would show that bad cheap drives out good. I give
you microchannel vrs ISA as an example.

That may have been more a matter of open vs. closed; note that VLB beat both
MCA and EISA, and (1st-gen) PCI succeeded more because of getting a lot of
manufacturers on board (including Apple) than any technical superiority.
Actually, how was windows supposed to recover from parity error?
IBM didn't write windows.

It didn't write DOS either, but given that it was the organization that
pushed and marketed DOS for Microsoft, it still shares a fair bit of
culpability. It also was the one who designed the original PC motherboard
with the dubious "parity error goes to NMI" design to begin with.
Last time I talked to my buddy that tracks failures, it was software
first, then disks, then electronics. A little research and a few
calculations will tell you how often there will be a memory error.

That certainly matches my experience. Also, cooling fans are in there
between drives and electronics: moving parts break down WAY faster than most
electronics (although bad caps are up there too.)
 
Frank said:
The problem is:
WITH ECC built in, probably over half the cases of "Blue Screen of
Death" or computer crashes and foulups *could* be things of the past!

Really, so why does my linux installation run without crashes on exactly
the same hardware that Windows used to crash on regularly? I think
you'll find the problem is not related to whether it has or has not got
ECC memory!
The worst part is, people could actually be KILLED by such mistakes made
by a computer that might have been corrected with ECC ... Yet nobody
will trace it back to that; just: "Sorry, the computer crashed!"

Well, if you must insist on running Windows, that is a risk you have to
take :)
"I've ran my computer for years without ECC; and it ran just FINE!"
Only that ignores the freezups, crashes, blue-screens, and other crap
that got attributed to software instead of memory failures. ;-{

Try attributing those freezups, crashes, blue-screens and other crap to
WINDOWS instead of lack of ECC. Linux runs fine without ANY of these
problems on the SAME hardware (disc formatted and linux installed
instead of Windows). Current uptime, >70 days on a linux system that
runs 100% CPU usage on BOINC projects 24/7, so not just some box that is
sitting idle in the corner.
These days people seem to *expect* such failures, when 99.99% of the
ones caused by bad memory (probably well over half) could be fixed.

No, most of these failures are caused by Windows.
 
Back
Top