Null Reference exception in user app- debug help needed! (.NET 1.1 )

  • Thread starter Thread starter Big Mike
  • Start date Start date
B

Big Mike

Hi,

I've been investigating a 'System.NullReferenceException' for about a
week but have not got anywhere with my investigation- I need some
guidance from more experienced .NETers!

My App is .NET 1.1 , built using VS 2003 on XP SP3.

I have not found the exact reproduction criteria for the crash as it
appears to be tranisient. The crash occurs when my GUI app removes a
panel from the parent display container, but it does *not* occur
everytime. The panel being removed contains a few buttons and
ListView components (which is mentioned in the call stack when the
crash is captured). My App uses a Stack to 'push' new panels on top
of app, so they become visible, and then will pop the panels off of
the stack to make the panel underneath visible again.
eg, a button on panel A causes Panel B to be displayed over Panel A.
A button on Panel B causes Panel C to be displayed on top of panel B.
Panel C is 'closed' and Panel B is visible again. Panel B is closed
and Panel A is visibile again

I currently use AutoHotkey to perform the same mouse clicks over and
over again, but with a random time between them, in order to force
this issue. Usually, the issue occurs within 5 or 10 mins. I am
moving in to Panels B and C and back to A every 15 to 30 seconds, but
the issue has also occured after 5 minutes of *no* user activity in
Panel C, (then the panel is automatically closed and we return to
Panel A (through Panel B).

The crash (visually) occurs when Panel B is closed and we return to
panel A. but the crash (i believe) is caused by something in Panel C -
there are no ListView, or indeed any real functionality in Panel B.
When I pop off a panel, a reference is maintined to it as
'oldTopPanel' until another panel is popped of. so in my case, Panel C
finally looses all references to it when we drop back to Panel A from
B.

The actual error is reported in VS 2003 as follows...

A first chance exception of type 'System.NullReferenceException'
occurred in system.windows.forms.dll
Additional information: Object reference not set to an instance of
an object.

the call stack at this time looks like- (Word-wrap may make this look
nasty)

system.windows.forms.dll!
System.Windows.Forms.ListView.ListViewItemCollection.get_Item(int
displayIndex = 0x97) + 0x147 bytes
system.windows.forms.dll!
System.Windows.Forms.ListView.ListViewItemCollection.CopyTo
(System.Array dest = {System.Array}, int index = 0x97) + 0x30 bytes
system.windows.forms.dll!
System.Windows.Forms.ListView.OnHandleDestroyed(System.EventArgs e =
{System.EventArgs}) + 0x12f bytes
system.windows.forms.dll!System.Windows.Forms.Control.WmDestroy
(System.Windows.Forms.Message m = {System.Windows.Forms.Message}) +
0x19 bytes
system.windows.forms.dll!System.Windows.Forms.Control.WndProc
(System.Windows.Forms.Message m = {System.Windows.Forms.Message}) +
0x3ce bytes
system.windows.forms.dll!System.Windows.Forms.ListView.WndProc
(System.Windows.Forms.Message m = {System.Windows.Forms.Message}) +
0x3ce bytes
system.windows.forms.dll!ControlNativeWindow.OnMessage
(System.Windows.Forms.Message m = {System.Windows.Forms.Message}) +
0x13 bytes
system.windows.forms.dll!ControlNativeWindow.WndProc
(System.Windows.Forms.Message m = {System.Windows.Forms.Message}) +
0xda bytes
system.windows.forms.dll!
System.Windows.Forms.NativeWindow.DebuggableCallback(int hWnd =
0x160978, int msg = 0x2, int wparam = 0x0, int lparam = 0x0) + 0x3d
bytes
user32.dll!_InternalCallWinProc@20() + 0x28
user32.dll!_UserCallWinProcCheckWow@32() + 0xb7
user32.dll!_DispatchClientMessage@20() + 0x4d
user32.dll!___fnDWORD@4() + 0x24
ntdll.dll!KiUserCallbackDispatcher() + 0x13
user32.dll!_DispatchMessageW@4() + 0xf
system.windows.forms.dll!
System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods
+IMsoComponentManager.FPushMessageLoop(int dwComponentID = 0x1, int
reason = 0xffffffff, int pvLoopData = 0x0) + 0x349 bytes
system.windows.forms.dll!ThreadContext.RunMessageLoopInner(int reason
= 0xffffffff, System.Windows.Forms.ApplicationContext context =
{System.Windows.Forms.ApplicationContext}) + 0x1f5 bytes
system.windows.forms.dll!ThreadContext.RunMessageLoop(int reason =
0xffffffff, System.Windows.Forms.ApplicationContext context =
{System.Windows.Forms.ApplicationContext}) + 0x50 bytes
system.windows.forms.dll!System.Windows.Forms.Application.Run
(System.Windows.Forms.Form mainForm = {LMU.formMainScreen}) + 0x34
bytes
APP.exe!CUIControl.UIThread() Line 96 C++

At this point, I have already dropped out of the Application::Run ();
code (line 88 below), where my APPs gui is launched and onto the next
executable line, 96.

81: formMainScreen *mainForm;
82: try
83: {
84: Application::EnableVisualStyles ();
85: // Application::Run (new frmMain);
86:
87: mainForm = new formMainScreen;
88: Application::Run (mainForm); << GUI
launched here
89:
90: // ... and wait until UI is closed
91:
92:
93: // Signal to main() that the UI thread has terminated (or, at
least, is about to),
94: // allowing main() to terminate also
95:
96: m_ThreadRunning = false;
97: }
98: catch (System::Exception *e)
99: {
100: Exceptions::CException_::ReportException (e);
101: Exceptions::CException_::ReportExceptionToScreen (e);
102:
103: m_ThreadRunning = false;
104: }


What caught me by surprise here is that VS did not break *before* the
null reference access occured as it should do in a 'first-chance'
exception, but after it- the app has already suffered from the
exception! This means that I do not know where the null reference
access is. Q? is this indicitive of managed or unmanaged code?

Also, the call stack looks incorrect in that the code at line 96 of
App.exe does *not* call the functions above it! I don't know how
much of this callstack I can trust. If I can trust the stack, then it
looks like the functions in the stack are part of the framework that
stem from the Application::Run () on line#88. Can anyone else glean
any info from the callstack?


I have a user activated button on Panel C that causes an update to a
ListView component and if I comment out this UpdateDisplay()
functionality, then the issue does not appear to manifest (App as
running for 12 +hours with AHK script running). But the UpdateDisplay
() is one-shot, ie, it does not create a thread to do stuff later- it
does stuff there and then. and of course the issue can occur many
many seconds after this button is clicked.

I have added code to verify that each pointer used in the UpdateDisplay
() is valid before using it, but still crash. I have tried using
DebugDiag to capture the crash at the correct point, and although the
crash is captured, the callstack is different (but I couldn't locate
debug symbols for system.windows.forms.dll,

system_windows_forms_7b810000+100301 058ff7c8 00000000
01702924
system_windows_forms_7b810000+621a0 00000002 058ff784
7b8232c0
system_windows_forms_7b810000+23aa0 00000000 01702924
058ff808
system_windows_forms_7b810000+132c0 004c0a34 00000002
058ff7c8
user32!DefWindowProcW+57 00000002 0021c500 00000000
user32!DefWindowProcW+6b 058ff81c 00000000 01702a74
system_windows_forms_7b810000+56577 00120bb8 00000002
00000000
user32!GetDC+6d 05335d36 00120bb8 00000002
user32!GetDC+14f 00214718 05335d36 00120bb8
user32!DefWindowProcW+180 00822c80 00000002 00000000
user32!DefWindowProcW+1cc 058ff95c 00000018 00822c80
ntdll!KiUserCallbackDispatcher+13 058ffa50 00000000
058ffa84
user32!DispatchMessageW+f 01c5ebd0 01c5dc40 ffffffff
system_windows_forms_7b810000+1df5a 00000000 ffffffff
01c5e234
system_windows_forms_7b810000+1daca 016b780c 01c3c230
01c5e1f4
system_windows_forms_7b810000+1d7e5 016b780c 00000000
01c3c230
system_windows_forms_7b810000+41884 058ffc4c 791dce2f
058ffb88
mscorwks!GetCompileInfo+1001 058ffb88 00000000
058ffb60
mscorwks!GetCompileInfo+18a3 00bca8cb 79b7c000
058ffc78
mscorwks!GetCompileInfo+2de1 79bca8cb 79b7c000
79b93e78
mscorwks!GetCompileInfo+2e4f 058ffd74 0021c500
791b3360
mscorwks!ReleaseFusionInterfaces+41075 058ffd94 001d6cb0
792ea484
mscorwks!ReleaseFusionInterfaces+41144 001d6cb0 00000000
804fb572
mscorwks!Ordinal17+10c4 0021c6b8 00000000 001fea20
kernel32!GetModuleFileNameA+1ba 791c9453 0021c6b8
00000000

I have also tried using CBD/winDbg without luck- both seemed to
present similar responses and call stacks as VS 2003.

I'm coming to the end of things I can try to find the cause of this
issue- Do any of you have any ideas as to how I can proceed with this
investigation. If you need more info then please ask.

Thanks for your interest!
 
I've been investigating a 'System.NullReferenceException' for about a
week but have not got anywhere with my investigation- I need some
guidance from more experienced .NETers!

Wow! headache.

What is your basic pattern for interacting between forms. Are you trying
to call from the child through the parent or are you using events or
what?

In general, if you have a lot of logic in the UI (which I hate), you end
up with a "parent" form that controls what is being passed. it also
controls the list of children. The children then use events to talk back
to the parent.

You can improve this a great deal by moving more to a MVP type of
pattern, as the Presenter handles capturing communication.

I am not completely sure what you are doing in the application, as there
is a lot of interop going on. This appears to be where the problem is
occuring. Perhaps the form you are killing is dying before it finishes
some work or before another form needs to contact it (like between the
time it says "where is this form" and tries to grab it). It oculd be
that the request is tied up in interop while you are killing the form.
Moving to a more event driven model could solve that.

Of course the event model suggestion may require more refactoring than
your are prepared for right now.

Hope this at least spurs some ideas.

Peace and Grace,

--
Gregory A. Beamer (MVP)

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 
I've tried to keep the engine (model) separate from the GUI (view) and
have mostly achieved that. I have only one windows 'form'- i swap out
different form::panels which represent a different view into the
engine. some of the panels have a thread which keeps polling the
engine for changes and uses a delegate posted to the GUIs thread to
caused a screen refresh. My Panel 'C' is no different than this, but
I have only seen this null reference crash on a couple of panels that
contain the listview control. When a new panel comes to the front, the
update thread on the old front panel is stopped, and the update thread
on the new one started. when a 'leaf' screen (eg, panel 'C') closes,
then the screen update thread is stopped. then, when the leaf's parent
panel is replaced, the leaf panel finally looses it's last reference,
and presumably the .net environment disposes of the panel 'C' (as
described in my first post)

I did disable this GUI update thread on my panel 'C' from being
created (hence used and destroyed) but the issue still manifested
(only difference was that the screen did not update!), so I don't
think that this is threading issue.

I also tried clearing the contents of the listview before 'closing'
the panel, but this doesn't help.

Also, as mentioned in my original post, the only bit of code
commenting-out(!) that seemed to alleviate the crash was the bit of
code that clears the listview contents, and then repopulates the
listview. (the screen update thread only modifies the already
exisitng contents of the listview- it doesn't add of remove the
contents). This all happens on a button click, and does *not* cause a
delayed action of any kind. This, together with no reference of the
app's code in the call stack at the time of the crash (and that the
crash happened in the main GUI thread) leaves me either to suspect my
app's framework or a bug in the .net listview, neither which appear
likely given the evidense I have so far.

Why aren't there any references to my app in the call stack?
Why does an action on a button cause a crash an undefined amount of
time later?
Are there any clues in the callstack that someone can spot?
 
I've tried to keep the engine (model) separate from the GUI (view) and
have mostly achieved that. I have only one windows 'form'- i swap out
different form::panels which represent a different view into the
engine. some of the panels have a thread which keeps polling the
engine for changes and uses a delegate posted to the GUIs thread to
caused a screen refresh.

Is the polling absolutely necessary, or can you move so the Engine fires
events when there are changes and then have them handled only if certain
panels are visible. The event model often works better when you are
dealing with separation of GUI and application logic (engine), as the
GUI responds to changes rather than keeps open threads to determine if
changes have occurred. It is not always possible, I would imagine, but
it is cleaner as the UI determines whether it needs to handle events or
ignore, keeping a cleaner "separation of concerns".

If you have "too many changes", this can burn a lot of perf, but I doubt
you are burning any more than keeping threads alive.
Why aren't there any references to my app in the call stack?

I am not sure, but more than likely the separate thread is the problem
child, which would not be caught in the callstack of the main thread. I
am not sure how the threads are interacting, but my first guess on why
you are not seeing the call to the app is it is in the callstack of the
polling thread, while the exception is in the primary thread.

In short, it could be the polling thread is locking the listview and the
primary thread is attempting something else.

There are native methods for walking a call stack of additional threads.
I am not sure, off hand, how this is implemented in .NET, however. And
if I found a solution on this box, it would be .NET 2.0+. My other box
is Visual Studio 2010, so it is fully in the 4.0 framework. The solution
might or might not be similar to .NET 1.1 (Visual Studio 2003).
Why does an action on a button cause a crash an undefined amount of
time later?

Are there any clues in the callstack that someone can spot?

Based on description, the polling operation is the most likely culprit.
The thread is trying to update. Looking at the call stack, the problem
occurs when copying to a "node" that does not exist at the time. If this
were Visual Studio 2008, you could downloa the PDBs for the .NET
framework and step through what was happening in the debugger and see
use the call stack to move back and forth through the code. But you will
have to capture the other thread's callstack to see the app call, if
what I think is happening is in fact happening.

I am a bit blind here, as I do not have the code base (and I am not
asking you to post it here). Hopefully the ideas I have presented are
helping narrow down the exception or giving some ideas of another way to
set up the application that eliminates the problem.

Peace and Grace,


--
Gregory A. Beamer (MVP)

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 
Thanks for the information- I too would prefer a change-based method,
rather than than the constant polling but time is money!

I took a slightly different tack to 'sorting' this issue- I cut down
my large app to just the code that exhibits the issue. Using a
autohot key script to peform the actions that cause the issue i left
the s/w and timed the delay till the s/w crashed. The Autohotkey
script performs the movement between panels A, B and C as decribed in
my first post. This script takes about 30-40seconds to run. This
script is then repeated 'ad infinitum'.

On my .net1.1, the app exhibits the crash circa 45-60 minutes- this
was over a sample of three or four runs. I then took the cut-down
code and built it with .net2 (vs2005) and ran the same test. I left
the code running over xmas holiday and the s/w was *still* running
under command of autohotkey 18days later.

Fron this info, I have concluded that a bug in .net 1.1 framework
issue is causing the crash. (obviously, I'm doing something in the
code to stimulate the error, but ultimately, the crash is not *caused*
by my code.)

solution- move to a more recent version of .net!
 
Back
Top