Gerry,
Thanks again for taking the time.
A copy of the Stop Error report is needed if you want targetted
help. I'm not sure what 'the Stop Error report' is. If it's the
details from
the kernel dump, here it is:
=======================================================
Loading Dump File [C:\Bioptigen\Kernel Dumps\MEMORY122208A.DMP]
Kernel Summary Dump File: Only kernel address space is available
Symbol search path is:
SRV*c:\symbols*
http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows XP Kernel Version 2600 (Service Pack 2) MP (4 procs) Free x86
compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 2600.xpsp.051011-1528
Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055c700
Debug session time: Mon Dec 22 15:08:06.625 2008 (GMT-5)
System Uptime: 0 days 0:45:25.579
Loading Kernel Symbols
..................................................................................................................................................
Loading User Symbols
PEB is paged out (Peb.Ldr = 7ffd900c). Type ".hh dbgerr001" for
details Loading unloaded module list
...........
*******************************************************************************
*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck A, {c0605000, 2, 1, 805043d1}
Probably caused by : memory_corruption ( nt!MiAddWorkingSetPage+cf )
Followup: MachineOwner
---------
1: kd> !analyze -v
*******************************************************************************
*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid)
address at an interrupt request level (IRQL) that is too high. This
is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: c0605000, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation
(only on chips which support this level of status)
Arg4: 805043d1, address which referenced memory
Debugging Details:
------------------
WRITE_ADDRESS: c0605000
CURRENT_IRQL: 2
FAULTING_IP:
nt!MiAddWorkingSetPage+cf
805043d1 c70680000000 mov dword ptr [esi],80h
DEFAULT_BUCKET_ID: DRIVER_FAULT
BUGCHECK_STR: 0xA
PROCESS_NAME: MemTest.exe
TRAP_FRAME: b404ac44 -- (.trap 0xffffffffb404ac44)
ErrCode = 00000002
eax=0007a4cf ebx=0007a4cf ecx=00000041 edx=89714902 esi=c0605000
edi=c0883000 eip=805043d1 esp=b404acb8 ebp=b404acdc iopl=0 nv
up ei pl zr na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030
gs=0000 efl=00010246 nt!MiAddWorkingSetPage+0xcf:
805043d1 c70680000000 mov dword ptr [esi],80h
ds:0023:c0605000=????????
Resetting default scope
LAST_CONTROL_TRANSFER: from 805043d1 to 805437d0
STACK_TEXT:
b404ac44 805043d1 badb0d00 89714902 81de66a4 nt!KiTrap0E+0x238
b404acdc 805051fb 89d56588 89d56588 c03f7620
nt!MiAddWorkingSetPage+0xcf b404acf4 8051fbc1 c0883cfc 7eec4000
0012e81c nt!MiLocateAndReserveWsle+0xc1 b404ad4c 80543668 81de6d88
7eec4000 80000000 nt!MmAccessFault+0xfb5
b404ad4c 004020a1 81de6d88 7eec4000 80000000 nt!KiTrap0E+0xd0
WARNING: Frame IP not in any known module. Following frames may be
wrong. 000000a8 00000000 00000000 00000000 00000000 0x4020a1
STACK_COMMAND: kb
FOLLOWUP_IP:
nt!MiAddWorkingSetPage+cf
805043d1 c70680000000 mov dword ptr [esi],80h
SYMBOL_STACK_INDEX: 1
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nt
DEBUG_FLR_IMAGE_TIMESTAMP: 434c50c7
SYMBOL_NAME: nt!MiAddWorkingSetPage+cf
IMAGE_NAME: memory_corruption
FAILURE_BUCKET_ID: 0xA_W_nt!MiAddWorkingSetPage+cf
BUCKET_ID: 0xA_W_nt!MiAddWorkingSetPage+cf
Followup: MachineOwner
=======================================================
The Bug Check code, and Args 1, 2, and 3 never vary; that is: always
IRQL_NOT_LESS_OR_EQUAL, always to address 0xC0605000, always at IRQL
level 2, always a write operation. For the first several weeks of
testing, Arg4 never varied either (0x805043d1); in the last couple of
days, we've seen other addresses there. A few days ago, I removed
some drivers of which we were suspicious (the National Instruments
drivers we use to acquire images) and some we weren't using (Roxio
DVD burning). My speculation is that the address change is related to
that, that I changed the driver load order in some way.
Disable automatic restart on system failure
We have done so--there's plenty of time to look at that screen.
The inference from what you have written is that the Errors are not
occuring during the boot process. Is this correct?
Yes, that is correct.
This means that you can start to eliminate things that load when
you boot. Cam I ask you to elaborate on this a little? Do you
mean, use MSConfig
and disable stuff on the Startup and Services tab?
Have you tried to reproduce the error in safe mode?
No. It has been on our list of things to try, but we hadn't gotten
to it yet. I will investigate.
Are there any yellow question marks in Device Manager? No.
What errors are appearing in Event Viewer?
There are Warning level messages that appear to be related
temporally.
Each time a crash occurs, we see15-25 instances (it varies) of a
message "An error was detected on device\HardDisk0\D during a paging
operation." (There is no page file on drive D:, although we are
running on that disk.) The messages, in fact, reinforce our working
hypothesis: that a driver inappropriately raised the IRQL level, and
that a paging operation happened to occur at the right time.
There are no other Warnings or Errors in the System area.
There are a few error messages in the Application area, but none that
occur regularly--they appear to be side-effects of the fact that the
OS is crashing. Things like explorer.exe or spoolsvc.exe faulting,
and we might have 1 or 2 of each, spread over the time period when
we've seen 40-50 crashes.
The system on which we are crashing does not have network
access--we're trying to emulate our field conditions, and our product
is a medical device which would be operated in this way. If you think
it is critical, I can get a copy of one of these Error messages to
you using a thumb drive, but I think I have copied it faithfully.
Again, thank you for your efforts
PC