Slight File Corruption?

  • Thread starter Thread starter Davej
  • Start date Start date
D

Davej

I thought Vista or a failing HD was my problem with occasional flakey behavior on this D630 laptop so I upgraded it with Win7, new memory, and a new HD, but the problem remains. It became especially apparent when I needed to download a 1GB+ file and get that file to verify. There is usually a single32 bit error in it when I compare copies using fc but fciv seems to detecteven more variations between the files.

Maybe I have a virus? Maybe I have a hardware problem? When I installed thenew memory I ran the Dell bios tests and then also let Win7 run its memorytest.

Any suggestions?
 
I thought Vista or a failing HD was my problem with occasional flakey behavior on this D630 laptop so I upgraded it with Win7, new memory, and a new HD, but the problem remains. It became especially apparent when I needed to download a 1GB+ file and get that file to verify. There is usually a single 32 bit error in it when I compare copies using fc but fciv seems to detect even more variations between the files.

Maybe I have a virus? Maybe I have a hardware problem? When I installed the new memory I ran the Dell bios tests and then also let Win7 run its memory test.

Any suggestions?

You want memtest86+ from memtest.org . Downloads are located
half way down the web page.

http://www.memtest.org

That program tests all memory not reserved by the BIOS. Which means
everything is tested except for one megabyte of memory.

*******

Take the following scenario:

1) Computer shows signs of memory errors. You test (as above),
and verify it is true.
2) You buy new memory, install it, and damn... more errors.
3) It's possible the chipset voltage needs to be adjusted.
When out of the blue, my current fancy PC started throwing
errors, a slight (0.1V) adjustment on the Northbridge (which
has the memory interface) fixed it. No new memory need be
purchased in such a case.

Memories do fail while in service. I've probably lost over half
a dozen sticks of RAM, in the 1.5 to 2 year range of usage. But
those sticks were all "generic", unbranded RAM. They weren't
made by Kingston or Crucial. They were made by no-name companies,
apparently using less desirable RAM chips.

As for memory errors, we can divide them into two classes.

1) You run memtest86+. The exact same locations fail, each
time you run the test. These are "stuck-at" faults. These
probably don't respond to voltage. You buy new sticks when this
happens. These could be the generics in action.

2) You run memtest86+. The errors are at different locations
each time. These are transient errors. Perhaps a voltage
adjustment can fix it.

If I'm passing memtest86+, but the PC is obviously "sick",
I also use Prime95 (torture test option) from mersenne.org/freesoft
as a test. It doesn't test all of the memory on the computer, so
it's not as thorough that way. But it does heat up the computer
parts a bit better, and is a better stress tester. No errors in
Prime95 are acceptable. You adjust the computer operating conditions,
until it's "Prime95 clean". There are programs other than Prime95,
but that's just what I happen to have on hand here.

Paul
 
I thought Vista or a failing HD was my problem with occasional flakey behavior on this D630 laptop so I upgraded it with Win7, new memory, and a new HD, but the problem remains. It became especially apparent when I needed to download a 1GB+ file and get that file to verify. There is usually a single 32 bit error in it when I compare copies using fc but fciv seems to detect even more variations between the files.

Maybe I have a virus? Maybe I have a hardware problem? When I installed the new memory I ran the Dell bios tests and then also let Win7 run its memory test.

Any suggestions?

I forgot to mention the oddball cases.

There was one NVidia chipset on a motherboard, that seemed to
insert a 32 bit word in error, on Ethernet packets. I never heard
whether NVidia was able to resolve this via BIOS or driver updates.
That's an example of an actual hardware bug that causes errors.
That doesn't happen too often, because many companies know how
to test things before releasing them to the public.

Paul
 
I forgot to mention the oddball cases.

There was one NVidia chipset on a motherboard, that seemed to
insert a 32 bit word in error, on Ethernet packets. I never heard
whether NVidia was able to resolve this via BIOS or driver updates.
That's an example of an actual hardware bug that causes errors.
That doesn't happen too often, because many companies know how
to test things before releasing them to the public.


Paul

Well, at this point I'm wondering what the logical steps are to try. I either have a memory problem or a BIOS problem or a motherboard hardware error.After installing Win7 I did not go to Dell and download all the drivers. Ijust let the Dell "automatic driver updater" choose what I needed. Maybe that was a mistake because I just checked and they also have an newer BIOS available.
 
Well, at this point I'm wondering what the logical steps are to try. I either have a
memory problem or a BIOS problem or a motherboard hardware error. After installing Win7
I did not go to Dell and download all the drivers. I just let the Dell
"automatic driver updater" choose what I needed. Maybe that was a mistake because
I just checked and they also have an newer BIOS available.

Does the cooling in the unit seem to be all in order ? Are the vents
clear ? The chipset is usually rated to around 99C for an upper limit
thermally (just from memory). That's a package temperature limit. The
material around the silicon die degrades at temperatures higher than that.
The silicon is good to a slight bit more (in the old days, that number
was around 135C before there might be long term damage - simulation
might test the design at 105C). I've owned at least one mobile device
here, where the designers abused the chipset. I was measuring 75C just
on the outside of the heatsink it was fitted with. So when they know
they have 99C to work with, they might decide to just run it that hot.
With barely adequate cooling.

Your chipset is GM965, so it won't have the NVidia bug. That's an
Intel chipset. The machine has dual graphics, Intel graphics for low
power, and a second GPU for gaming (from NVidia). The GPU typically
only affects the appearance of the screen - an NVidia GPU should
not be corrupting downloads.

I would start the way we always start. Prove the core works first.
That means memtest86+ and Prime95. And making sure the cooling
vents are clean. I don't know of an Ethernet integrity test (something
that checks each packet as it arrives). I think Ethernet is protected
by CRC32, so even at the hardware level, a wired network knows whether
a packet should be retransmitted. I expect wireless would have something
similar. And then the tricky part, is seeing if the OS has a counter
that counts such errors. As an early warning the problem is in the
networking section.

But the machine would be pretty useless, if the core computing portion
can't run error free.

See if you can find some utility to measure temperatures. GPU-Z can likely
tell you the NVidia GPU temperature. Dual GPU is tricky, and since I've
never owned one, I don't know what utilities do or don't work with them.
I have a number of different versions of this one downloaded here.

http://en.wikipedia.org/wiki/GPU-Z

http://www.techpowerup.com/gpuz/

For CPU temperatures, there are things like CoreTemp. I haven't use that
one. The page here, shows the authors site.

http://www.overclock.net/t/185632/core-temp

http://www.alcpu.com/CoreTemp/

That one is intended for Core or later family processors. At some point,
they added digital temperature monitoring to the CPU. And the processor
measures delta_T with respect to TjMax. The CoreTemp program can only give
an accurate reading, if it happens to know TjMax. When the program first
came out, some processors, the TjMax wasn't properly known. That's a
general weakness with the Intel method, in that the TjMax doesn't read out
of a register to go with the delta reading.

The Intel CPU will throttle, as part of temperature control. When the
delta_T drops to zero, and the processor is running at TjMax (say, 90
or 100C), the processor will do things to try to bring down the power
consumption. This leads to less computing getting done. Good computer
designs, the cooling system will try to keep that from happening.
This is not a big deal in this case, as the Intel design should
be stable at TjMax, and correct computing results should still
come out. Running CoreTemp, would be to see if the machine is
abnormally hot, and stays jammed against TjMax all the time.

The one you'd like to monitor, is chipset temperature. The SuperI/O chip
has an analog hardware monitor, with a typical three channels of
analog input. If the CPU has a thermal diode, a channel can be tied
to that. And sometimes the chipset has a diode as well. You'd check to
see if Dell has a utility. Or, you can use Speedfan.

http://www.almico.com/sfdownload.php

http://www.almico.com/speedfan449.exe

So CoreTemps is for digital temps from the CPU. Speedfan may also
do that now. Speedfan normally can be relied on, to find the
SuperI/O chip, and read out the channels there. And if you're
lucky, give a chipset temperature.

HTH,
Paul
 
thought Vista or a failing HD was my problem with occasional flakey
behavior on this D630 laptop so I upgraded it with Win7, new memory,
and a new HD, but the problem remains. It became especially apparent
when I needed to download a 1GB+ file and get that file to verify.
There is usually a single 32 bit error in it when I compare copies
using fc but fciv seems to detect even more variations between the
files.
I have a virus? Maybe I have a hardware problem? When I installed the
new memory I ran the Dell bios tests and then also let Win7 run its
memory test.
Any suggestions?

-
You can stream from any other available sources, to that HD, to
duplicate the problem. If not, then it's specific to the
communications stream. Going out of your lan link, eternal modem, say
a USB setup, indicates a LAN related problem. Haven't had that
problem since buffer chips on ISA communication serial ports boards in
early or pre-W95 days.
 
Does the cooling in the unit seem to be all in order ? Are the vents
clear ? The chipset is usually rated to around 99C for an upper limit
thermally (just from memory). That's a package temperature limit.

Well, after ten minutes core0 puked on the Prime95 test, and it was blowing warm air out the back of the unit by then. The error said said the rounding was
more than 0.4 or something like that. I am not normally stressing the core so these earlier file errors are not due to the core temperature.

Then I ran Memtest86+ for an hour with no errors.

But SFC /scannnow is showing an increasing amount of OS corruption, so I want to clarify that I am not saying that the network is producing these errors. In fact I had one downloaded file that tested good and then next day it tested bad.

The next thing I want to do is make sure that hibernate is turned off so that it can be eliminated as a possible issue.
 
Well, after ten minutes core0 puked on the Prime95 test, and it was blowing warm air out the back of the unit by then. The error said said the rounding was
more than 0.4 or something like that. I am not normally stressing the core so these earlier file errors are not due to the core temperature.

Then I ran Memtest86+ for an hour with no errors.

Run Memtest86+ for at least 3 full cycles - somewhere around 4 - 24
hours depending on your system.
 
Well, after ten minutes core0 puked on the Prime95 test, and it was blowing warm air out the back of the unit by then. The error said said the rounding was
more than 0.4 or something like that. I am not normally stressing the core so these earlier file errors are not due to the core temperature.

Then I ran Memtest86+ for an hour with no errors.

But SFC /scannnow is showing an increasing amount of OS corruption, so I want to clarify that I am not saying that the network is producing these errors. In fact I had one downloaded file that tested good and then next day it tested bad.

The next thing I want to do is make sure that hibernate is turned off so that it can be eliminated as a possible issue.

Use some utilities to check the temperatures.

Prime95 doesn't know what happened. It could be a
memory error, it could be an error coming from the FPU.
What it tells you, is CPU/chipset/RAM is bad somehow.
But it doesn't dwell on the details.

Memtest86+ is a bit more specific, in that it writes a bit
pattern to RAM, and reads it back. So at least it's "trying"
to do a hardware level test. Prime95 on the other hand,
is an "acceptance test". It's the test you run, to prove a
computer is compute-worthy, and can be trusted to do
your taxes :-) Your computer won't make much of an
adding machine, if it is throwing Prime95 errors.
Prime95 does a bunch of FFTs, and knows the exact answer
expected. Any defect in the core section of the computer,
will throw off that result (RAM error, or broken FPU).

On enthusiast motherboards, you'd start dialing the
voltages at this point, to make the error go away.
OEM laptops don't have that capability.

So rather than dial voltages, about all I can do in
this case, is try and cool things off. And using the
temperature utilities, is to look for a cooling problem.
On some desktops, the socket tab used to snap, and
the cooler would fall off. Laptops don't usually have something
like that happen - they're a bit more securely fastened.

Paul
 
I thought Vista or a failing HD was my problem with occasional flakey behavior on this D630 laptop so I upgraded it with Win7, new memory, and a new HD, but the problem remains. It became especially apparent when I needed to download a 1GB+ file and get that file to verify. There is usually a single 32 bit error in it when I compare copies using fc but fciv seems to detect even more variations between the files.

Maybe I have a virus? Maybe I have a hardware problem? When I installed the new memory I ran the Dell bios tests and then also let Win7 run its memory test.

Any suggestions?

Memtest86+
 

I ran Memtest86+ for a day with no detected errors but Prime95 produces an error within ten minutes. I tried slowing the cpu clock down to the "silentmode" but Prime95 still failed, so I guess this laptop is toast... Unless a disassembly and inspection might somehow turn up something.
 

I ran Memtest86+ for a day with no detected errors but Prime95 produces an error within ten minutes. I tried slowing the cpu clock down to the "silent mode" but Prime95 still failed, so I guess this laptop is toast... Unless a disassembly and inspection might somehow turn up something.

What do the temperatures look like ? Have you tried
any temperature utilities yet ?

Paul
 

I ran Memtest86+ for a day with no detected errors but Prime95 producesan error within ten minutes. I tried slowing the cpu clock down to the "silent mode" but Prime95 still failed, so I guess this laptop is toast... Unless a disassembly and inspection might somehow turn up something.

What do the temperatures look like ? Have you tried
any temperature utilities yet ?

Paul


Well, maybe I do have an issue here. I still have the clock in "silent mode" so the ram clock is only 332MHz, but when I ran Prime95 the cpu temp was approaching 70C and the cpu fan speed hadn't changed yet. There is apparently no GPU temperature available on this version of the MB.
 
I ran Memtest86+ for a day with no detected errors but Prime95 produces an error within ten minutes. I tried slowing the cpu clock down to the "silent mode" but Prime95 still failed, so I guess this laptop is toast... Unless a disassembly and inspection might somehow turn up something.

What do the temperatures look like ? Have you tried
any temperature utilities yet ?

Paul

Well, maybe I do have an issue here. I still have the clock in "silent mode" so the ram clock is only 332MHz, but when I ran Prime95 the cpu temp was approaching 70C and the cpu fan speed hadn't changed yet. There is apparently no GPU temperature available on this version of the MB.
Hmmm... no, I found a utility that forced the fan speed to high and then ran Prime95 but it failed quickly and at a low temperature.
 
Davej said:
On 1/14/2014 3:29 PM, Davej wrote:
]
I ran Memtest86+ for a day with no detected errors but Prime95 produces an error within ten minutes. I tried slowing the cpu clock down to the "silent mode" but Prime95 still failed, so I guess this laptop is toast... Unless a disassembly and inspection might somehow turn up something.

What do the temperatures look like ? Have you tried
any temperature utilities yet ?

Paul
Well, maybe I do have an issue here. I still have the clock in "silent mode" so the ram clock is only 332MHz, but when I ran Prime95 the cpu temp was approaching 70C and the cpu fan speed hadn't changed yet. There is apparently no GPU temperature available on this version of the MB.
Hmmm... no, I found a utility that forced the fan speed to high and then ran Prime95 but it failed quickly and at a low temperature.

Almico Speedfan is the utility I'd want to look
at fan speed and voltages.

But at this point, I don't know what you can do about this.
On my desktop motherboard, I've solved one problem by adjusting
Vnb (northbridge voltage). OEM laptops aren't typically
equipped that way, and they tend to run nominal voltages.
You would need to be skilled at hacking voltage regulators
on the board, and even if you did an overvolt mod, you'd
still have the nuisance of no interface to read out what is
going on. Adjusting the voltages would be very difficult, without
having monitoring capabilities in place.

Since you've tried different memory, and that didn't help,
that leaves chipset or CPU or the need for
a voltage adjustment. You say temperatures look good,
so the "easy" fix of working on cooling, isn't going to
help.

Something is unstable there, but I don't know how to narrow
it down any further.

Paul
 
Back
Top