troubleshooting infinite loops

Russ Holsclaw · Jul 22, 2005

We have a program that's used by many customers in the field. It works
pretty reliably, but once in a while, we get a report from someone that
it's going into an infinite loop, using 100% of the CPU, and not producing
results.

We have not been able to reproduce this problem here, so we need to find a
way to collect data on this loop. We don't have a clue what part of this
program might be going into such a loop, so we need to get information from
a user that this is happening to.

Is there a way to tell a user how to terminate the program in such a way
that it produces some sort of snapshot memory dump, or at least indication
of the contents of the instruction pointer register?

TIA

Pavel Lebedinsky [MSFT] · Jul 23, 2005

You can use windbg/cdb to create a memory dump. Download latest
debuggers from
http://www.microsoft.com/whdc/DevTools/Debugging/default.mspx,
then do this:

c:\debuggers>cdb -pv -p <process id>

0:000> .dump -mt c:\hang.dmp
0:000> q

You can then load the dump into windbg and examine call stacks,
memory, registers etc. Some of the most useful commands for troubleshooting
a 100% CPU problem are ~*k (list all call stacks) and !runaway (show
which threads consumed the most CPU time).

..dump -mt creates a relatively small dump that has just enough information
to get call stacks for all threads in the process. If you need full memory
contents, use .dump -ma (but the dump will be much larger).

Russ Holsclaw · Jul 25, 2005

Pavel Lebedinsky said:
You can use windbg/cdb to create a memory dump. Download latest
debuggers from
http://www.microsoft.com/whdc/DevTools/Debugging/default.mspx,
then do this:

c:\debuggers>cdb -pv -p <process id>

0:000> .dump -mt c:\hang.dmp
0:000> q

You can then load the dump into windbg and examine call stacks,
memory, registers etc. Some of the most useful commands for
troubleshooting
a 100% CPU problem are ~*k (list all call stacks) and !runaway (show
which threads consumed the most CPU time).

.dump -mt creates a relatively small dump that has just enough
information
to get call stacks for all threads in the process. If you need full
memory
contents, use .dump -ma (but the dump will be much larger).

Thanks for the input on this. Upon reading the Windows debugger
instructions, it mentioned that the NTSD.exe program had the same
capabilities, and is supposed to be included with every copy of Windows (at
least all those of the NT flavor). So, acting on that information, I wrote
an email to our customer telling them how to procure a mini-dump of our
program, without having to download anything.

I hope to find out what function is looping in the program based on the
address found in the stack trace, with the help of a .map file produced by
the linker. This is more like how we used to shoot bugs in the "golden age
of the mainframe"! :-)

Something else that would be useful here would be to have a set of program
"listing" files that include the machine instructions generated by the
compiler. That would help me further relate the source code to the object
code, without having to send a source-code-laden executable to a customer.
This sort of thing used to be standard output from compilers back in the
aforementioned "golden age", but I don't see such an option in the
"settings" dialog box of the Visual Studio IDE. Am I looking in the wrong
place, or is knowing what compilers produce considered too out-of-fashion
these days?

Russ Holsclaw · Jul 25, 2005

Russ Holsclaw said:
Something else that would be useful here would be to have a set of
program "listing" files that include the machine instructions generated
by the compiler. That would help me further relate the source code to the
object code, without having to send a source-code-laden executable to a
customer. This sort of thing used to be standard output from compilers
back in the aforementioned "golden age", but I don't see such an option
in the "settings" dialog box of the Visual Studio IDE. Am I looking in
the wrong place, or is knowing what compilers produce considered too
out-of-fashion these days?

I figured this out myself. It looks like it's the /FAcs option on the
compiler commmand line. Apparently, MS didn't see fit to provide a nice
little graphic interface with checkboxes and radio buttons for the full
range of command-line options.

It looks like I'm going to have to study it a bit to figure out how to use
it properly, since many of the procedures seem to be listed as though they
have a origin of 0 for the "assembler location counter" while others seem
to have their relative addresses incremented in a way that I find more
familiar. Maybe there's an explanation of this somewhere ... when I find
more time to study the matter.

Pavel Lebedinsky [MSFT] · Jul 25, 2005

Russ Holsclaw said:
Thanks for the input on this. Upon reading the Windows debugger
instructions, it mentioned that the NTSD.exe program had the same
capabilities, and is supposed to be included with every copy of Windows
(at least all those of the NT flavor). So, acting on that information, I
wrote an email to our customer telling them how to procure a mini-dump of
our program, without having to download anything.

The built-in ntsd.exe is a lot more limited than the latest version.
It should be able to generate a dump on XP and later but I'm not sure
if it supports all the options that you might need (like storing thread
times in the dump, for later analysis with !runaway).

In general the WinDbg team strongly recommends using the latest
version instead of the built-in one.

I hope to find out what function is looping in the program based on the
address found in the stack trace, with the help of a .map file produced by
the linker. This is more like how we used to shoot bugs in the "golden age
of the mainframe"!

Did you mean "stone age"?

These days .map files are rarely (if ever) needed. You should just generate
..pdb files for everything you build, and store a copy of source files. This
should be all you need to debug dumps or live crashes:

http://support.microsoft.com/default.aspx?scid=kb;en-us;291585

Russ Holsclaw · Jul 25, 2005

Did you mean "stone age"?

Is it Microsoft policy to insult your customers?

Russ · Jul 26, 2005

Pavel Lebedinsky said:
The built-in ntsd.exe is a lot more limited than the latest version.
It should be able to generate a dump on XP and later but I'm not sure
if it supports all the options that you might need (like storing thread
times in the dump, for later analysis with !runaway).

I see your point. I came home and tried to run ntsd on my Win2k system, and
it's very much lacking in function, including the dump command.

I guess I had gotten the impression that such debugging facilities had been
part of Windows for quite some time, even though I hadn't been aware of
them. Perhaps the reason I wasn't aware was that they simply didn't exist
apart from the debuggers built into the IDE's.

I've often wondered why the old-fashioned tried-and-true method of debugging
via memory-dump had apparently gone out of fashion. Obviously, the size of
memory compared to computers of a few decades ago plays a role, but mostly
in terms of not wanting to print them out on a printer.

Still, I think programmers were given better tools, and documentation on how
to use them, back in what you call the "Stone Age" (Grrr!!)

So, apparently the idea of dumping a running application so you can go
through the entrails is only now getting seriously developed.

Well, the first thing I've got to do tomorrow is to tell the customer that I
wrote that my instructions for using ntsd are in error, since they can't get
a dump. It never occurred to me that such a facility wouldn't have been
built into a system such as Windows right from the start.

So, do I have to tell the customer to install the entire Windows debugging
package in order to get a dump out of a Win2k system, or is there a way they
get just the minimum support (like cdb.exe), or does this thing include a
whole legion of DLL's that have to be installed and registered?

These days .map files are rarely (if ever) needed. You should just generate
.pdb files for everything you build, and store a copy of source files. This
should be all you need to debug dumps or live crashes:

http://support.microsoft.com/default.aspx?scid=kb;en-us;291585

I had often wondered what all those big PDB files were, and what function
they performed. The normal documentation does not make much mention of them,
and other Windows programmers I've worked with who've used C++ did not seem
to know what they were for.

My goal for this particular situation is to try to find out why our
application is looping in this particular machine in the field. I didn't
want to have to build a special "debug" version of the program, partially
because such problems might well only show up with the original executable.
(There might be a buffer overflow or something that would manifest itself
differently if everything in the program got re-arranged.) It wasn't clear
whether the pdb had to be sent to the customer in order to run the debugger
to get a dump. I've heard of situations, from programmers working under me,
wherein programs built in debug-mode didn't fail, but only the "release"
version did. In this case, that's the version that's failing on the
customer's machine.

I certainly never saw anything that said that MAP files and program listings
were obsolete. I guess that's what provoked your "Stone age" remark. I
guess the ability to read hex is now something to be ashamed of? I don't
think so.

Pavel Lebedinsky [MSFT] · Jul 26, 2005

Russ said:
Still, I think programmers were given better tools, and documentation on
how
to use them, back in what you call the "Stone Age" (Grrr!!)

The "stone age" remark was in reference to using map files instead of
PDB symbols (more on this below). Sorry if it came out the wrong way.

So, apparently the idea of dumping a running application so you can go
through the entrails is only now getting seriously developed.

Actually, you could always do this on NT based systems. Drwtsn32.exe
can be used to create dump files, and windbg has always had the ability to
read and analyze them. But the latest tools do a much better job at this.

Well, the first thing I've got to do tomorrow is to tell the customer that
I
wrote that my instructions for using ntsd are in error, since they can't
get
a dump. It never occurred to me that such a facility wouldn't have been
built into a system such as Windows right from the start.

So, do I have to tell the customer to install the entire Windows debugging
package in order to get a dump out of a Win2k system, or is there a way
they
get just the minimum support (like cdb.exe), or does this thing include a
whole legion of DLL's that have to be installed and registered?

cdb.exe does require a couple of dlls (at least dbgeng.dll and dbghelp.dll)
but it doesn't need any special installation - you can just copy the files
and it will work. There are no DLLs that need to be registered.

Using the latest cdb/ntsd would be the preferred solution but if the
customer
cannot do this, you can also try the built in drwtsn32.exe. If you run it
without
any arguments it will tell you where it's going to create dumps. Then you
can
dump a process by running "drwtsn32 -p <pid>". The big catch is that
drwtsn32 will kill the target process after creating the dump, but in this
case
it might be acceptable since the process is stuck anyway.

The resulting dump can be loaded into windbg/ntsd/cdb but some things like
thread times will not be available.

I had often wondered what all those big PDB files were, and what function
they performed. The normal documentation does not make much mention of
them,
and other Windows programmers I've worked with who've used C++ did not
seem
to know what they were for.

My goal for this particular situation is to try to find out why our
application is looping in this particular machine in the field. I didn't
want to have to build a special "debug" version of the program, partially
because such problems might well only show up with the original
executable.
(There might be a buffer overflow or something that would manifest itself
differently if everything in the program got re-arranged.) It wasn't
clear
whether the pdb had to be sent to the customer in order to run the
debugger
to get a dump. I've heard of situations, from programmers working under
me,
wherein programs built in debug-mode didn't fail, but only the "release"
version did. In this case, that's the version that's failing on the
customer's machine.

The point is that you should build everything, even your release versions,
with .pdb files. If you do it correctly (the KB article I mentioned has the
details) there is no perfomance penalty at runtime - you don't lose any
optimizations, there's no additional code generated, etc.

All Microsoft products (Windows, Office, Visual Studio etc) are built
with PDB symbols for both debug and retail. Windows symbols are
even accessible over the internet, and debuggers can be configured
to download the right symbol files automatically:

http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx

I certainly never saw anything that said that MAP files and program
listings
were obsolete. I guess that's what provoked your "Stone age" remark. I
guess the ability to read hex is now something to be ashamed of? I don't
think so.

Assembly level debugging is still a very much needed skill, I agree. But
you don't really need map files to do it. If you have pdb symbols you
can look up functions by address, disassemble them and so on right
from the debugger. A PDB file has a lot more data than a .map file;
in fact you can even generate a map file from a .pdb:

http://msdn.microsoft.com/msdnmag/issues/0400/bugslayer/default.aspx

Of course you can still choose to generate map files as part of your
build process; there's nothing wrong with it. But generating PDB files
in addition to (or instead of) map files will generally make debugging
much easier.

troubleshooting infinite loops

Russ Holsclaw

Pavel Lebedinsky [MSFT]

Russ Holsclaw

Russ Holsclaw

Pavel Lebedinsky [MSFT]

Russ Holsclaw

Russ

Pavel Lebedinsky [MSFT]