profiler which does not just sum up used time

J

Joe Withawk

I have a problem with an application which runs smoothly but once every few
seconds shudder.
Having used the .net stopwatch class I now know that a certain loop
generally runs very quickly but sometimes it takes ten times as long.
The loop is not trivial so the easiest way for me to find the reason for the
delay would be to profile it in such a way that I can see the time each call
to it takes and not just the accumulated time.

Generally if a method is called ten times and each call takes 1ms then you
see that that method used 10*1ms time.
I would like to see a list of ten measurements of 1ms each.
With my current problem it would be a list of perhaps 1000 calls each taking
5ms and a few calls taking 50 ms. Then I would look at the long calls and
see what happens.

I do not beleive I have seen a profiler which gives this detail level, but
perhaps someone else has? Or perhaps someone has a suggestion as to what
might be an alternative?
 
M

Marc Gravell

I do not beleive I have seen a profiler which gives this detail level, but
perhaps someone else has? Or perhaps someone has a suggestion as to what
might be an alternative?

Well, if you are simply trying to investigate the current issue, a
crude approach could be to simply add some instrumentation to the
method... put a stopwatch in... if (at the end) you find it has run
over a certain threshhold, then write the ellapsed time and the
arguments to the trace, the event-log, the console or a file. Crude,
but it should work...

Marc
 
J

Joe Withawk

Well, if you are simply trying to investigate the current issue, a
crude approach could be to simply add some instrumentation to the
method... put a stopwatch in... if (at the end) you find it has run
over a certain threshhold, then write the ellapsed time and the
arguments to the trace, the event-log, the console or a file. Crude,
but it should work...

I did more or less just that.
I measured the time , over some interval, for five methods being called in
the loop.
I found that ALL sometimes took longer.
It was not that if one took longer then all others did as well. Often it was
slightly grouped but it looked like there was some external disturbance
which prused the thread and someontes method1 would be executing when it
happened and sometimes method2 would be executing and therefore would be
slower.

What would be a good next step in figuring out what is going on? I tried
processexplorer and did not see anything special starting up or accessing
the disk at the time of shudder.

Perhaps I should try to start a high priority thread in another test program
and let it loop like cracy while measuring how long some action in the loop
takes. If it also once in a while drops in speed, it certainly is an
external problem.

Ok, just did that.
It does not show the same pattern as my real problemapplication, but it does
show something odd. At very regular intervals there is a huge spike in the
measurement. www.greenleaf.dk/time.jpg
The thread is highest priority and simply does the folowing. Perhaps I
expose my ignorance here, but could someone please explain to me what is
going on? Should a high priority thread not be alowed to run before every
background process, when there are two cores?
System.Diagnostics.Stopwatch sw;

long[] time = new long[10000];

int c = 0;

void test()

{

while (true)

{

sw.Reset();

sw.Start();

double y = 0;

for (int i = 0; i < 10000; i++)

y = Math.Sin(Math.Cos(Math.Sqrt(i)));

sw.Stop();

time[c++] = sw.ElapsedTicks;

if (c == 5000) write timing to disk and quit

}

}
 
B

Ben Voigt [C++ MVP]

The thread is highest priority and simply does the folowing. Perhaps I
expose my ignorance here, but could someone please explain to me what is
going on? Should a high priority thread not be alowed to run before every
background process, when there are two cores?

If the highest priority thread waits on any other operation, like disk or
network I/O, it won't run until that other operation completes. Even
something as simple as your extra logging code can introduce periodic pauses
while the buffered I/O library actually writes the data to the disk.
 
W

Willy Denoyette [MVP]

Joe Withawk said:
Well, if you are simply trying to investigate the current issue, a
crude approach could be to simply add some instrumentation to the
method... put a stopwatch in... if (at the end) you find it has run
over a certain threshhold, then write the ellapsed time and the
arguments to the trace, the event-log, the console or a file. Crude,
but it should work...

I did more or less just that.
I measured the time , over some interval, for five methods being called in
the loop.
I found that ALL sometimes took longer.
It was not that if one took longer then all others did as well. Often it
was slightly grouped but it looked like there was some external
disturbance which prused the thread and someontes method1 would be
executing when it happened and sometimes method2 would be executing and
therefore would be slower.

What would be a good next step in figuring out what is going on? I tried
processexplorer and did not see anything special starting up or accessing
the disk at the time of shudder.

Perhaps I should try to start a high priority thread in another test
program and let it loop like cracy while measuring how long some action in
the loop takes. If it also once in a while drops in speed, it certainly is
an external problem.

Ok, just did that.
It does not show the same pattern as my real problemapplication, but it
does show something odd. At very regular intervals there is a huge spike
in the measurement. www.greenleaf.dk/time.jpg
The thread is highest priority and simply does the folowing. Perhaps I
expose my ignorance here, but could someone please explain to me what is
going on? Should a high priority thread not be alowed to run before every
background process, when there are two cores?
System.Diagnostics.Stopwatch sw;

long[] time = new long[10000];

int c = 0;

void test()

{

while (true)

{

sw.Reset();

sw.Start();

double y = 0;

for (int i = 0; i < 10000; i++)

y = Math.Sin(Math.Cos(Math.Sqrt(i)));

sw.Stop();

time[c++] = sw.ElapsedTicks;

if (c == 5000) write timing to disk and quit

}

}



Windows is not a real time OS, there is no guarantee that your thread will
not get pre-empted when it runs, even lower priority threads will get a
chance to run and this is exactly what you are seeing here.
Each thread in the system has a fixed quantum for which it can occupy a CPU
(core), at the end of a threads quantum, the scheduler (running at the
highest priority) will inspect the priority queues to see whether there are
higher priority threads ready to run, if there are such threads, your thread
will get pre-empted. When there are are no such threads, the scheduler will
keep your thread running on the same core. However, if your thread has
consumed ~5 seconds of CPU time he will get pre-empted by a lower priority
thread (and bet there will be a lot of them). When this happens, the
scheduler will boost the priority of the lower priority threads depending on
the kind of thread. UI threads (owning a window handle) will receive a boost
of 3 non UI threads will get a boost of 1 or 2 depending whether it's an IO
thread or not. The result of this is, that it's now possible that a lower
priority thread runs at a higher priority level than your thread and as such
gets a CPU for more than it's normal quantum.
To conclude, on a system with only one or two CPU's (cores) a higher
priority thread will run for at most ~5 second before it will get
pre-empted. The time it will get off a CPU depends on the other threads
activity, but keeping a single core busy for such a long period (~5 seconds)
will probably accumulate that much lower priority threads to become ready,
so that your thread will get pre-empted for at a complete quantum or more.

Willy.
 
B

Ben Voigt [C++ MVP]

Windows is not a real time OS, there is no guarantee that your thread will
not get pre-empted when it runs, even lower priority threads will get a
chance to run and this is exactly what you are seeing here.
Each thread in the system has a fixed quantum for which it can occupy a
CPU (core), at the end of a threads quantum, the scheduler (running at the
highest priority) will inspect the priority queues to see whether there
are higher priority threads ready to run, if there are such threads, your
thread will get pre-empted. When there are are no such threads, the
scheduler will keep your thread running on the same core. However, if your
thread has consumed ~5 seconds of CPU time he will get pre-empted by a
lower priority thread (and bet there will be a lot of them). When this
happens, the scheduler will boost the priority of the lower priority
threads depending on the kind of thread. UI threads (owning a window
handle) will receive a boost of 3 non UI threads will get a boost of 1 or
2 depending whether it's an IO thread or not. The result of this is, that
it's now possible that a lower priority thread runs at a higher priority
level than your thread and as such gets a CPU for more than it's normal
quantum.
To conclude, on a system with only one or two CPU's (cores) a higher
priority thread will run for at most ~5 second before it will get
pre-empted. The time it will get off a CPU depends on the other threads
activity, but keeping a single core busy for such a long period (~5
seconds) will probably accumulate that much lower priority threads to
become ready, so that your thread will get pre-empted for at a complete
quantum or more.

Except that these dynamic priority boosts never go above the highest dynamic
priority, there are fixed high-responsiveness priorities that will not be
preempted unless they voluntarily wait or an even higher priority thread
becomes runnable. Bugs in such threads usually bring the whole system to a
halt.

The non-realtime nature of Windows comes from a combination of Windows' own
high-priority tasks, and the fact that a highest priority thread becoming
runnable still waits for the next scheduler pass. I think drivers entirely
in non-pageable memory and using interrupt handlers can be realtime, I know
application programs cannot.
 
J

Joe Withawk

To conclude, on a system with only one or two CPU's (cores) a higher
priority thread will run for at most ~5 second before it will get
pre-empted. The time it will get off a CPU depends on the other threads
activity, but keeping a single core busy for such a long period (~5
seconds) will probably accumulate that much lower priority threads to
become ready, so that your thread will get pre-empted for at a complete
quantum or more.

Thanks for the details. I was aware that to prevent starvation the threads
would over time build up priority. I did however beleive that their priority
would never go above high.
An application can be set to realtime priority and according to what I have
read, this should prevent preempting entirely, but that still does not seem
to be the case.. Or perhaps it has to do with me not being able to set a
threads priority to realtime. Only the application can be realtime.

How would you suggest that I move on now? I have an app doing rendering and
it occationally gets preempted (or so I beleive, can I test is this is
indeed the cause somehow?) and when that happens I lose a few frames. I can
not live with even a single lost frame unless the rendering is very complex,
which it is generally not. What could be a solution to this?
 
J

Joe Withawk

The non-realtime nature of Windows comes from a combination of Windows'
own high-priority tasks, and the fact that a highest priority thread
becoming runnable still waits for the next scheduler pass. I think
drivers entirely in non-pageable memory and using interrupt handlers can
be realtime, I know application programs cannot.

I do not really need absolute realtime, though it would be nice to have onw
core entirely for my own personal use, but is there a way to create a thread
at a higher priority level than Highest? After all, the win32 api does
expose the realtime level as well, but that is going outside .net.
 
B

Ben Voigt [C++ MVP]

Joe Withawk said:
Thanks for the details. I was aware that to prevent starvation the threads
would over time build up priority. I did however beleive that their
priority would never go above high.
http://msdn2.microsoft.com/en-us/library/ms685100(VS.85).aspx
http://msdn2.microsoft.com/en-us/library/ms684828(VS.85).aspx

An application can be set to realtime priority and according to what I
have read, this should prevent preempting entirely, but that still does
not seem to be the case.. Or perhaps it has to do with me not being able
to set a threads priority to realtime. Only the application can be
realtime.

How would you suggest that I move on now? I have an app doing rendering
and it occationally gets preempted (or so I beleive, can I test is this is
indeed the cause somehow?) and when that happens I lose a few frames. I
can not live with even a single lost frame unless the rendering is very
complex, which it is generally not. What could be a solution to this?

If you have already thoroughly debugged your application and you trust it
not to lockup the entire system, you could set REALTIME_PRIORITY_CLASS for
your entire process.

A better way would be to increase the buffer so it doesn't
overflow/underflow during a short delay.
 
W

Willy Denoyette [MVP]

Ben Voigt said:
Except that these dynamic priority boosts never go above the highest
dynamic priority, there are fixed high-responsiveness priorities that will
not be preempted unless they voluntarily wait or an even higher priority
thread becomes runnable. Bugs in such threads usually bring the whole
system to a halt.

<
I never said that a *user* thread would go above the highest dynamic level,
the kernel himself doesn't follow this rule, several system threads (driver
and non driver) running at dynamic base priority level, have their dynamic
priority set to realtime level, I have seen system threads with their base
level at 8 and their dynamic level at 31.
But my point is that the system prevents starvation by allowing lower
priority threads to get at the CPU, even the lowest priority thread (dynamic
idle) will get a chance to run.
The non-realtime nature of Windows comes from a combination of Windows'
own high-priority tasks, and the fact that a highest priority thread
becoming runnable still waits for the next scheduler pass. I think
drivers entirely in non-pageable memory and using interrupt handlers can
be realtime, I know application programs cannot.

There are non driver threads that run at real-time priority level too.

Willy.
 
P

Peter Duniho

[...]
How would you suggest that I move on now? I have an app doing rendering
and it occationally gets preempted (or so I beleive, can I test is this
is indeed the cause somehow?) and when that happens I lose a few frames.
I can not live with even a single lost frame unless the rendering is
very complex, which it is generally not. What could be a solution to
this?

I don't understand the comment that "I can not live with even a single
lost frame". Either you always want to render frames as the fastest rate
that the computer is capable of doing, or you are rendering frames in
lockstep with some external activity.

In the former case, obviously you do have some wiggle room for a slightly
lower frame rate, since performance on differing hardware is going to
vary. In the latter case, if you aren't running this program on a
computer that can easily keep up with the necessary framerate (and thus,
your rendering thread can afford to defer to other threads now and then),
then you need to find faster hardware.

In either case, the answer is to stop trying to set your thread to a
higher priority.

For rendering applications, it's almost always the case that _smoothness_
is valued more highly than actual frame rate. And as you've seen,
starving threads until they get boosted to your own priority is counter to
getting a program to run smoothly.

Pete
 
W

Willy Denoyette [MVP]

Joe Withawk said:
Thanks for the details. I was aware that to prevent starvation the threads
would over time build up priority. I did however beleive that their
priority would never go above high.
An application can be set to realtime priority and according to what I
have read, this should prevent preempting entirely, but that still does
not seem to be the case.. Or perhaps it has to do with me not being able
to set a threads priority to realtime. Only the application can be
realtime.
Thraed priorty HIGH is a set of levels, ranging from 11 to 15, a high level
thread will have his level set at 13 and can get pre-empted by another HIGH
level thread with his level set at 15. A number of kernel threads are
running at this level, you cannot prevent being prempted by these threads.
Note also that the finalizer thread runs at this same level, having your
threads running at this level may prevent the finalizer to run (well he may
run every 5 seconds at best).

How would you suggest that I move on now? I have an app doing rendering
and it occationally gets preempted (or so I beleive, can I test is this is
indeed the cause somehow?) and when that happens I lose a few frames. I
can not live with even a single lost frame unless the rendering is very
complex, which it is generally not. What could be a solution to this?

What you could try (at the risk of turning the system into an unstable
state) is set the process priority level to real-time (level 24), this will
turn all threads in the process into the real-time range. Keep in mind
however, that kernel and user (less likely) threads may have CPU affinity,
keeping a CPU busy will prevent such threads to run unless they also run at
the real-time level.

Willy.
 
B

Ben Voigt [C++ MVP]

The non-realtime nature of Windows comes from a combination of Windows'
There are non driver threads that run at real-time priority level too.

The symbolic constant may be REALTIME_PRIORITY_CLASS, but it still falls
short of realtime. Threads can't truely be realtime on Windows, only
interrupt handlers.
 
J

Joe Withawk

I don't understand the comment that "I can not live with even a single
lost frame". Either you always want to render frames as the fastest rate
that the computer is capable of doing, or you are rendering frames in
lockstep with some external activity.

If the monitor enters vertical blank and I am not yet done with a new frame
because I have been preempted, then I will have to present it the next time
vblank happens. I have then lost a frame. If I had a large buffer of frames
which would simply be used when I was behind schedule, then it would be
fine, but I dont. Ordinary graphics cards support a maximum of three buffers
and that is aparently not enough.
In the former case, obviously you do have some wiggle room for a slightly
lower frame rate, since performance on differing hardware is going to
vary. In the latter case, if you aren't running this program on a
computer that can easily keep up with the necessary framerate (and thus,
your rendering thread can afford to defer to other threads now and then),
then you need to find faster hardware.

I can render several thousand frames a second, but with a buffer of no more
than 3, I will pause when the buffer is full, if I sync to screen, and if I
is then blocked for a duration of 3 frames, then I will lose frames.

It seems somewhat odd that I am blocked for that long. It is, with current
refreshrate, 50 ms I am locked out sometimes before the buffer underflows
and frames are lost. The hardware is very capable of keeping up but then
windows throws a stick in the wheel.
For rendering applications, it's almost always the case that _smoothness_
is valued more highly than actual frame rate. And as you've seen,
starving threads until they get boosted to your own priority is counter to
getting a program to run smoothly.

You have a point there. What you are saying is that it is better to let my
thread be preempted often for a very short period of time that to keep
running for a while and then get locked entirely out for an extended period?
Yes that is true. When the problem started it seemed natural to increase
priority, but I should try my testing with a low priority actually... given
that not a lot of other things are running.
However... with 2+ cores it would still be nice to reserve one for myself
and let the others fight over the other core.
 
W

Willy Denoyette [MVP]

Ben Voigt said:
The symbolic constant may be REALTIME_PRIORITY_CLASS, but it still falls
short of realtime. Threads can't truely be realtime on Windows, only
interrupt handlers.

I'm not referring at the REALTIME_PRIORITY_CLASS symbolic constant, nor did
I ever say that Windows threads are realtime capable.
I'm referring to actual system behavior where you can see non-driver
real-time threads running at IRQL 1. The OS kernel is able to run a thread
at any priority he like at any IRQL. Device drivers and software interrupt
handlers may run at IRQL 0, 1 or 2, hardware interrupt handlers run at IRQL
2...31. The dispatcher may raise it's level to IRQL 2 (DPC/dispatch) as such
preventing other threads from interrupting thread dispatching, this doesn't
mean that the dispatcher runs a hard real-time thread though.
Note that all of this strongly depends on the OS version and may even vary
from one SP level to another and may also depend on the installed features
and devices.
Willy.
 
W

Willy Denoyette [MVP]

Willy Denoyette said:
I'm not referring at the REALTIME_PRIORITY_CLASS symbolic constant, nor
did I ever say that Windows threads are realtime capable.
I'm referring to actual system behavior where you can see non-driver
real-time threads running at IRQL 1. The OS kernel is able to run a thread
at any priority he like at any IRQL. Device drivers and software interrupt
handlers may run at IRQL 0, 1 or 2, hardware interrupt handlers run at
IRQL 2...31. The dispatcher may raise it's level to IRQL 2 (DPC/dispatch)
as such preventing other threads from interrupting thread dispatching,
this doesn't mean that the dispatcher runs a hard real-time thread though.
Note that all of this strongly depends on the OS version and may even vary
from one SP level to another and may also depend on the installed features
and devices.
Willy.



Oh, forgot to mention that even non kernel code can boost the thread
priority into the real-time range, one notable sample is Media player, which
calls the Multi Media Component Sub System to push his playback thread
priority into the real-time range. However, as a good Windows citizen MMCSS
takes care that these threads get their priorities boosted for only a
limited period of time.

Willy.
 
P

Peter Duniho

[...]
I can render several thousand frames a second, but with a buffer of no
more than 3, I will pause when the buffer is full, if I sync to screen,
and if I is then blocked for a duration of 3 frames, then I will lose
frames.

Taking the dubious claim of "several thousand frames a second" at face
value... :)

So what if you lose a frame? Even at 60fps (the maximum rate for video
synchronized to a typical LCD monitor), losing a single frame now and then
isn't going to be noticeable at all by a human being.

There's not a single human user in the world who would notice the
occasional lost frame (or even dozens of frames) at a frame rate of
"several thousand frames a second".

Why is it so important to you that these frames be displayed at a constant
rate, with no delays? In what context does it actually matter?
It seems somewhat odd that I am blocked for that long.

"For that long"? You haven't defined "several", but even at 1000 fps, 3
frames is only 3 milliseconds. Given that a thread quantum on Windows is
around 50 ms, I'd say you're pretty lucky you're _only_ getting blocked
for 3 (though actually this is not atypical, since most threads never need
to use their entire quantum).

In any case again, in what context does missing 3 milliseconds worth of
visual presentation actually matter?
It is, with current refreshrate, 50 ms I am locked out sometimes before
the buffer underflows and frames are lost. The hardware is very capable
of keeping up but then windows throws a stick in the wheel.

What hardware? What buffer? Why is it so small that 50ms allows it to
underflow? How is this connected to the question of rendering to the
screen? What are you actually trying to do here?

I see a couple of possibilities: one is that your application really
doesn't need to be as real-time as you seem to think it does, in which
case the solution is to stop worrying so much about a few milliseconds
here and there.

The other possibility is that your application is absolutely time-critical
and extremely sensitive to scheduling variations, in which case the
solution is very likely to be to either change your specifications so that
the application is _not_ time-critical and _not_ extremely sensitive to
scheduling variations, or to simply abandon the idea of using Windows at
all.

You can do things with the threads as suggested by Willy and Ben to try to
hog the processor all to yourself, but at that point you've sacrificed any
semblance of a stable Windows installation. Windows wasn't built to be
used that way, and doing so is almost certainly going to lead to trouble.
If you need to use a computer in that way, you need a different OS.
You have a point there. What you are saying is that it is better to let
my thread be preempted often for a very short period of time that to
keep running for a while and then get locked entirely out for an
extended period?

Yes, that's what I'm saying. Windows is actually not that bad at
scheduling threads, if you let it do its job. Generally speaking, threads
that never become unrunnable and which can use as much CPU time as you can
give it are actually much better off running at _lower_ priority rather
than higher.
Yes that is true. When the problem started it seemed natural to increase
priority, but I should try my testing with a low priority actually...
given that not a lot of other things are running.
However... with 2+ cores it would still be nice to reserve one for
myself and let the others fight over the other core.

Well, unfortunately that's just not how Windows is designed. If you
absolutely need that functionality, you need a different OS.

Pete
 
J

Joe Withawk

What you could try (at the risk of turning the system into an unstable
state) is set the process priority level to real-time (level 24), this
will turn all threads in the process into the real-time range. Keep in
mind however, that kernel and user (less likely) threads may have CPU
affinity, keeping a CPU busy will prevent such threads to run unless they
also run at the real-time level.

With application at realtime the pause is approx once every 20 seconds. With
it at non realtime it is approx every 10 seconds.
What does this suggest? There is something which is raised above my priority
in 20 seconds when realtime and in 10 seconds when not realtime?
 
B

Ben Voigt [C++ MVP]

Willy Denoyette said:
I'm not referring at the REALTIME_PRIORITY_CLASS symbolic constant, nor
did I ever say that Windows threads are realtime capable.
I'm referring to actual system behavior where you can see non-driver
real-time threads running at IRQL 1. The OS kernel is able to run a thread

This is because there's no such thing as a "driver thread" or "non-driver
thread". There are only threads executing user mode code, and threads
executing kernel mode code (be it from a 3rd party driver, bundled driver,
or core kernel).
at any priority he like at any IRQL. Device drivers and software interrupt
handlers may run at IRQL 0, 1 or 2, hardware interrupt handlers run at
IRQL 2...31. The dispatcher may raise it's level to IRQL 2 (DPC/dispatch)
as such preventing other threads from interrupting thread dispatching,
this doesn't mean that the dispatcher runs a hard real-time thread though.

Staying at elevated IRQL is an error, so that can't be considered to provide
realtime processing. Only hardware interrupt handlers are realtime.
 
W

Willy Denoyette [MVP]

Joe Withawk said:
With application at realtime the pause is approx once every 20 seconds.
With it at non realtime it is approx every 10 seconds.
What does this suggest? There is something which is raised above my
priority in 20 seconds when realtime and in 10 seconds when not realtime?


Just wonder how you measured these timings and what code you are actually
running when measuring.
As said in another reply, the real-time priority class will put all threads
in the process in the real-time range (at level 24), a couple of system
threads may be running at level 31, the system can even run threads at an
interrupt request level that prevents other threads to run. As I said
before, Windows is not a real-time OS, you have no control over the thread
scheduler other than increasing the relative priority level of one user
thread over another user thread, but you won't be able to stop the scheduler
to prevent pre-emption of your thread, so you can't achieve what you want on
Windows (without the risk of rendering the system unusable), that is,
running a thread on a single core without being interrupted or pre-empted,
you can't even prevent being re-scheduled on another core.

Willy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top