Stress Test

  • Thread starter Thread starter philo 
  • Start date Start date
P

philo 

I've got a machine on my bench that has BSODed quite a bit when in
use...but just sitting idle on my bench it can sit there all day and is
fine.

I have been running something called "Heavy Load" but it does not appear
to really be giving the machine a good test.

Any suggestions?
 
All of my machines with the exception of the home server run the BOINC
client under World Community Grid 24 X 7 and that keeps the CPUs pegged
at 100% so it might make for a good CPU/Memory/Chipset test but it won't
do much for disk and I/O problems. Many people swear by the old Prime95
program as a test but it will suffer from the same shortcoming.



Thanks


I'll give that a try. I can always run the manufacturer's diagnostic on
the HD
 
philo said:
Thanks


I'll give that a try. I can always run the manufacturer's diagnostic on
the HD

I use Prime95 torture test, as proof a computer is stable
and ready to give to another person.

http://www.mersenne.org/download/index.php

The downloads are about half way down the page, and are available
for a number of platforms.

When the software asks "Join GIMPS?", answer No as you are
Just Testing.

That software computes a number of FFTs (fast fourier transforms)
in assembler code. It opens an execution thread per CPU core
(virtual or physical). It makes the CPU run warm, and it can
help you determine whether the CPU cooler you are using, is
adequate. There is a setting, to select how much RAM to test.
On large RAM machines, you may need to run multiple copies,
if attempting to cover the maximum amount of memory possible.
(Copy Prime95 into a separate folder, to be able to
run a separate copy. As long as each executable has its
own private folder, it's happy.)

If any testing thread gets "the wrong answer", the text
in that thread turns red and the thread stops. So the first
error detected, is enough to tell you there is a problem,
and either the RAM is bad, the RAM settings need to be
adjusted, or maybe VCore needs to be bumped up a notch.
It's a CPU, Northbridge, RAM tester.

For my own personal machines, if it runs for at least
four hours, with no threads stopping, that's my proof
(acceptance test) that the machine is ready to use for
serious work.

You can include 3D game play, while Prime95 is running.
On some of the older computers, the combination of some
AGP slot activity (video card traffic), plus the Prime95,
may do a slightly better job of uncovering problems. But
testers don't really have the time to play games, while
Prime95 is reaching it's acceptance condition. The test
would run forever, if you didn't stop it.

You can just leave Prime95 running over night, assuming
your CPU cooling is in good shape. If I was running it
on a laptop, I'd want to watch the CPU temperature for
a bit, to make sure the fan cooling is working
properly. Someone managed to melt a corner of their
laptop once, when the laptop was running with high
CPU usage, and the cooling wasn't working well enough :-)

Paul
 
Somewhere said:
[snip]
I use Prime95 torture test, as proof a computer is stable
and ready to give to another person.

http://www.mersenne.org/download/index.php

The downloads are about half way down the page, and are available
for a number of platforms.

When the software asks "Join GIMPS?", answer No as you are
Just Testing.

That software computes a number of FFTs (fast fourier transforms)
in assembler code. It opens an execution thread per CPU core
(virtual or physical). It makes the CPU run warm, and it can
help you determine whether the CPU cooler you are using, is
adequate. There is a setting, to select how much RAM to test.
On large RAM machines, you may need to run multiple copies,
if attempting to cover the maximum amount of memory possible.
(Copy Prime95 into a separate folder, to be able to
run a separate copy. As long as each executable has its
own private folder, it's happy.)

If any testing thread gets "the wrong answer", the text
in that thread turns red and the thread stops. So the first
error detected, is enough to tell you there is a problem,
and either the RAM is bad, the RAM settings need to be
adjusted, or maybe VCore needs to be bumped up a notch.
It's a CPU, Northbridge, RAM tester.

For my own personal machines, if it runs for at least
four hours, with no threads stopping, that's my proof
(acceptance test) that the machine is ready to use for
serious work.
[snip]

Snap! I use exactly the same programme, for about the same amount of time
(unless there's intermittant glitches, then it runs for 24 hours minimum).
I've been using it since the good old days when overclocking Mendicinos and
Coppermines and I still give it a good workout when undervolting laptops.
;-) That process, finding a stable voltage for each 'speedstep' (and
transitions between them) can take days but pays off with less heat and
better battery life.
--
</Shaun>

"Humans will have advanced a long, long, way when religious belief has a
cozy little classification in the DSM."
David Melville (in r.a.s.f1).
 
On 03/27/2014 05:16 AM, ~misfit~ wrote:
X

Snap! I use exactly the same programme, for about the same amount of time
(unless there's intermittant glitches, then it runs for 24 hours minimum).
I've been using it since the good old days when overclocking Mendicinos and
Coppermines and I still give it a good workout when undervolting laptops.
;-) That process, finding a stable voltage for each 'speedstep' (and
transitions between them) can take days but pays off with less heat and
better battery life.


I just shut the machine down now. I ran the torture test for 17 hours
and there were no glitches.

I did unplug and replug the power supply connectors when I first put the
machine on my bench. It's possible that's all it was. I see that problem
a few times a year.
 
Somewhere said:
On 03/27/2014 05:16 AM, ~misfit~ wrote:
X



I just shut the machine down now. I ran the torture test for 17 hours
and there were no glitches.

I did unplug and replug the power supply connectors when I first put
the machine on my bench. It's possible that's all it was. I see that
problem a few times a year.

That could indeed have been it. My standard approach to hardware problem
solving when it's not something obvious is to re-seat all expansion cards,
cables and connectors (often using 'CO cleaner' in slots and a pencil eraser
that I keep specifically for the purpose on the contacts of cards and RAM
modules). When I build a machine I also cover unused expansion / RAM slots
with bits of masking tape cut to size to keep out dust so they're pristine
if needed in future.
--
</Shaun>

"Humans will have advanced a long, long, way when religious belief has a
cozy little classification in the DSM."
David Melville (in r.a.s.f1).
 
I've got a machine on my bench that has BSODed quite a
bit when in use...but just sitting idle on my bench it
can sit there all day and is fine.

I have been running something called "Heavy Load" but
it does not appear to really be giving the machine a
good test.
I just shut the machine down now. I ran the torture
test for 17 hours and there were no glitches.

Generally, failures that happen only at high power levels
indicate a problem with the power supply or one of the
voltage regulators on the motherboard (one for the CPU,
another for memory), but because you're getting BSODs
only at lower power (I assume -- a Kill-A-Watt can verify),
I'd bet on a bad driver software or memory module.

Most retail memory is substandard, made from chips that are
either factory rejects, used, or overclocked from 15% to
100%. You can bet on that for modules with heatsinks
covering their chips.

Prime95 is probably not that good a memory diagnostic,
compared to MemTest86, MemTest86+, and Gold Memory, and
you want to run all 3 of those because MemTest86 and 86+
often give different results despite using the same
test methods, and Gold Memory has found errors missed by
both of them, sometimes in as little as 2-20 hours.
 
Most retail memory is substandard, made from chips that are
either factory rejects, used, or overclocked from 15% to
100%. You can bet on that for modules with heatsinks
covering their chips.

Prime95 is probably not that good a memory diagnostic,
compared to MemTest86, MemTest86+, and Gold Memory, and
you want to run all 3 of those because MemTest86 and 86+
often give different results despite using the same
test methods, and Gold Memory has found errors missed by
both of them, sometimes in as little as 2-20 hours.



The machine was made from spare parts
but the memory is very high quality.
 
Generally, failures that happen only at high power levels
indicate a problem with the power supply or one of the
voltage regulators on the motherboard (one for the CPU,
another for memory), but because you're getting BSODs
only at lower power (I assume -- a Kill-A-Watt can verify),
I'd bet on a bad driver software or memory module.

Most retail memory is substandard, made from chips that are
either factory rejects, used, or overclocked from 15% to
100%. You can bet on that for modules with heatsinks
covering their chips.

Prime95 is probably not that good a memory diagnostic,
compared to MemTest86, MemTest86+, and Gold Memory, and
you want to run all 3 of those because MemTest86 and 86+
often give different results despite using the same
test methods, and Gold Memory has found errors missed by
both of them, sometimes in as little as 2-20 hours.

I run memtest86+ for the "thoroughness" aspect.
It tests practically all the memory locations. Only
a roughly 1 megabyte reserved area is not tested, if
you test in dual channel mode only. With two test runs
and a single channel configuration with two sticks, you
can eventually test every location (swap the sticks and
retest).

Prime95 does not test all the memory. Not even close.
The memory area holding the OS will not be tested.
By running Prime95, it is a "combined" "compromise" test,
which stresses the CPU, Northbridge, and RAM DIMMs. If
the memory is "generally unstable" and needs a voltage bump
or relaxed Tras, Prime95 will hint at it. There is
no specific error message like "your Northbridge
needs a voltage bump". It is up to the operator to
fiddle with things, and notice changes in stability.

You can "walk the shmoo plot" for a processor with
Prime95. I adjust the CPU clock rate, and adjust
the voltage, until Prime95 throws an error at around
the ten minute mark. By inching along, I can prepare
a plot of voltage versus CPU frequency, at each point
trying to get Prime95 to fail at the ten minute mark
(roughly). This is a way of predicting how much
headroom a CPU has, as much as anything. I don't
generally leave my systems overclocked, but I
do experiment with them when they're new. A typical
CPU from the factory, is intended by the manufacturer
to have around 500MHz of head room, as a rough number.
That's to cover aging effects (electromigration) as
the processor gets older.

I consider Prime95 an "acceptance test", because in
my experience it predicts the computer is ready for
someone to use. If I only did memtest86+, that
doesn't prove CPU, Northbridge, and RAM DIMMs are
stable under the most demanding condition. If I'm working
on a gaming machine, I might play a 3D game while
the test is running, as well. That's to include a
bit of video activity in the mix (some systems have
thrown a Prime95 error, as soon as you start 3D gaming).

There are stress testers which apply maybe another
10% more stress, besides Prime95. But I don't use
them and haven't tested them.

Paul
 
In
Most retail memory is substandard, made from chips that are
either factory rejects, used, or overclocked from 15% to
100%.

That's an intereseting claim. What makes you say that?
 
Bert said:
In

That's an intereseting claim. What makes you say that?

At one time, semiconductor companies were "reputable".

It means, they only produced ICs with their name, date code
and other details, written on the top of the chip. And
only on fully working chips.

Anything which failed parametrically, was put in the shredder
and recycled.

*******

Today, there is a trend to "not waste anything".

Chips are given a quick test at wafer sort, but this
is not sufficient for a thorough test. Duff chips
are discarded at this point, as defective silicon die.

The rest of the parts are packaged.

A major expense with memory, is testing it. And even the
floor space associated with such a test, would be onerous.

Some of the chips will be fully tested, and the company
name (Samsung, Hynix, Micron, Infineon) can be written
on top.

Other chips, have failed some parametric. No time was spent
doing the memory test on them. The top of the package part
is blank.

The chip can be bought that way, with no logo on top, and
no thorough memory test done. People sort through these
chips with hand-held testers. You can find claims on
the web, that women in Japan take lots of chips home,
and use hand held testers to test the chips.

It means the original sources of chips are few, but
the paths they take before getting to the consumer
are many.

I can put my own logo on top of a chip, but that doesn't
say much about the path the chip has taken. Whether it's
"floor sweepings" (UTT) or legit proper product.

http://www.legitreviews.com/behind-closed-doors-utt-memory-ics-explained_199

The same thing happens to Flash memories, in that they're
graded.

LCD panels are also graded, and when you go to the Best Buy,
some monitors are made from nothing but B grade panels. The
panels are sorted according to number of stuck pixels. So when
you buy that "deal" monitor that was on sale, get it home and
it has a stuck pixel, there's a good chance that is not
exactly a random event. The panel was automatically examined
at the factory, the defects were spotted, and the grade was
printed on the panel. And some LCD monitor manufacturing
companies trade a few stuck pixels, for a price break on
the panels.

Some companies will offer a "no stuck pixel" guarantee,
and the reason they can do it, is they're starting with
"A" panels. While defects can still show up in the field,
there will be fewer of them. (Defect developed while the
product sat in the box.)

I can understand in the case of LCD panels, that there is a
need to use more of the product, without smashing it to bits
in a compactor outside the plant. I don't buy into the story
that memory or flash companies need to do this. Only one
grade should come out of such plants, and that should be
"working" according to the product spec. Not floor sweepings
leaving the plant, to have God knows what printed on top later.

Paul
 
"Humans will have advanced a long, long, way when religious belief has a
cozy little classification in the DSM."
David Melville (in r.a.s.f1).

I thought this quote funny. I know the DSM (1-5). It takes a certain kind of
religious belief to accept psychiarty, see "The Myth of Mental Illness" by
Thomas Szasz. Also it take a certain type of religious belief to accept
statistics in the life sciences, see "Odds Are, It's Wrong" by TOM
SIEGFRIED. Folks who use statistics are generally not professional
statisticians and only use canned stats computer program which in my
experience can be shown to be wrong over the space of a generation.

One funny thing about the DSM is that it originally defined homosexuality as
a disorder. But when the population of homosexuals reached a critical size
it was no longer classed as a mental disorder. You can bet you booties this
cycle will repeat with other "mental disorders".

I prefer established religions which don't change so fast with current
fashions.
 
The machine was made from spare parts
but the memory is very high quality.

What do you mean by "high quality"? I assume it has no
heatsinks, a sign of low quality, unless it's Rambus
RDRAM or early Samsung DDR3.
 
In

That's an interesting claim. What makes you say that?

My bad luck. About 10% - 20% of almost 1,000 modules I tried failed
testing, while I've had only 1 bad module that was made from branded,
non-overclocked chips. I don't mean 1% but 1 -- a single module with
solder blobbed over its gold contacts. I don't remember if I checked
it before returning it, but I know it didn't fail any testing.

Places like XbitLabs.com, APHnetworks.com, and OCaholic.ch review
memory modules and publish photos with the heatsinks removed so
you can see the chips underneath. The following are not exceptional
examples but rather typical of retail brands:

APHnetworks Mar. 2011 review of G.Skill RipJawsX 2100 MHz:

http://aphnetworks.com/review/g_skill_ripjaws_x_f3_17000cl9d_8gbxld_2x4gb/005.JPG

http://aphnetworks.com/review/g_skill_ripjaws_x_f3_17000cl9d_8gbxld_2x4gb/006.JPG

That Hynix DRAM chip is "H9" speed grade, which means 1333 MHz,
9-9-9, for an overclock of 58%.


Ocaholic, Jan. 2013, Corsair Vengeance 1866 MHz:
http://www.ocaholic.ch/modules/xcga...mal_CorsariVengeanceCMY16GX3M2A1866C9_042.jpg

The chips are Spectek, speed grade -15E, which means 1333 MHz.
Spectek is a division of Micron, that sells "refurbished" chips,
but the refurbishment process appears to consist of just
unsoldering chips and testing them in PCs, which isn't nearly
as good as what Micron does for its prime chip production line.

OCaholic, Dec. 2008, Mushkin 1600 MHz:

http://www.ocaholic.ch/modules/smartsection/item.php?itemid=230

The photo of the chips isn't clear enough to tell the brand or
speed rating, but I'm fairly sure the brand marking is not from
any real chip maker.


OCaholic Dec. 2013, Kingston HyperX Beast 1600 MHz:

http://ocaholic.ch/modules/xcgal/al...KHX16C9T3K4-32X/normal_KHX16C9T3K4-32X_27.jpg

Hynix "PB" means 1600 MHz. IOW these are probably the least
overclocked retail brand modules still made (Samsung 1600 MHz, not
sold at the retail level any more), but the chips are rated
11-11-11 clock cycles, not the modules' 9-9-9 ratings.

I've read that testing can be half the cost of producing chips, but
I don't remember if that was for complex chips, like CPUs, or much
simpler DRAM chips. Here's an article about UTT (UnTesTed) memory:

http://www.legitreviews.com/behind-closed-doors-utt-memory-ics-explained_199

I don't think all UTT chips are actually untested, and I suspect
many are actually rejects. Some DRAM chips are sold as whole
wafers to companies that create finished chips from them, and
there are even independent testing companies employed by module
makers that do nothing but test completed modules with just PCs
running MemTest86+, Gold Memory, or, at best, RST (plug-in card;
RST rated best in RealWorldTech.com's evaluation of memory
diagnostics, only 14 years ago). G.Skill admitted to using
MemTest86 (they probably meant MemTest86+) and until early 2012
allowing its sub-1866 MHz modules to ship with 1-2 errors. I
don't think G.Skill is worse than most other companies, it's
just that they were the only ones willing to reveal anything.
In comparison, real RAM chip companies test with machines
costing $4M US, but I don't know which module companies use
them, except KingMax and Kingston, and apparently Kingston
doesn't use them for every line of modules or even for every
module sold to OEM computer makers (they said it cost extra).

Some people with much more experience than me claimed that
my 10% - 20% failure rate is way, way too high and that for
the vast majority of users the quality difference between
modules made of prime non-overclocked chips and overclocked
or non-prime chips doesn't matter. I wish I could agree.
 
I ran every test I could think of and nothing unusual came up so I just
left the machine on my bench and would periodically use it. The other
day, while using it, the screen turned all white, then the machine
rebooted...so maybe there is a video problem.

It has on-board video, so I popped in a known good video card and will
use the machine periodically and see if it happens again.
 
Back
Top