Performance problem when drawing data received from DSP.

  • Thread starter Thread starter Alexander Arlievsky
  • Start date Start date
A

Alexander Arlievsky

We are developing an application that involves a PCI driver to a DSP card
that generates data at high rates (over 40 interrupts/sec, about 3MB/sec).

We use WinDriver for the driver (no kernel plugin) - and we allocate a user
buffer for dma operations (continuous mode dma(, we then copy the data from
this buffer to an application buffer (all done in user mode).

Running this driver and checking CPU load we get 15% (on a P4 2.4GHz, 1GB
RAM).

The GUI part of the application draws the received data on screen. Running
just the GUI part (without interrupts from the driver, using simulated data)
we get ~2% CPU load.

Now, here's the problem:

When we combine the 2 parts of the application (GUI & driver) we get a CPU
load of 40%!!!

Any idea on how come 2%GUI + 15%driver result in 40%?



We checked many performance counters, and noticed that the page fault
counter is extremely high at about 13000 per sec. We tried running the
system without a page file but received the same results.



P.S.
Our application includes C,Managed C++ and C# modules.
--
==============================
Alexander Arlievsky
(e-mail address removed)
Remove prefix, first and third and fifth words after "@"
"The best tools for debugging are brains"
==============================
 
Alexander said:
We use WinDriver for the driver (no kernel plugin) - and we allocate
a user buffer for dma operations (continuous mode dma(, we then copy
the data from this buffer to an application buffer (all done in user
mode).

Based on my experience with high-performance data acquisition, your driver
solution is quite inefficient. In a past project, we were able to getcard and custom driver.
Any idea on how come 2%GUI + 15%driver result in 40%?
P.S.
Our application includes C,Managed C++ and C# modules.

Likely culprits: Increased context switches, increased managed/unmanaged
switches. You might want to instrument your test setups to measure these
parameters to see if they might be the culprit.

-cd
 
Hi,
We plan to implement scatter/gather DMA transfer, which will eliminate one
copy - card will send
data directly to the buffer which later will be used for rendering, but now
we have to cope with the existing card
without such support.
Any ideas what can lead to such big number of page faults ? Other
performance counters are in reasonable limits.
--
==============================
Alexander Arlievsky
(e-mail address removed)
Remove prefix, first and third and fifth words after "@"
"The best tools for debugging are brains"
==============================
 
Alexander said:
Hi,
We plan to implement scatter/gather DMA transfer, which will
eliminate one copy - card will send
data directly to the buffer which later will be used for rendering,
but now we have to cope with the existing card
without such support.
Any ideas what can lead to such big number of page faults ? Other
performance counters are in reasonable limits.

Large number of page faults could be due to excessive allocation/freeing of
memory, excessive creation/destruction of threads, excessive recursion, etc.
Are these "hard faults" or "soft faults"? (You can tell the difference
using Performance Monitor).

-cd
 
Alexander said:
Hi,
We plan to implement scatter/gather DMA transfer, which will
eliminate one copy - card will send
data directly to the buffer which later will be used for rendering,
but now we have to cope with the existing card
without such support.

In the application I worked on, we had our custom driver allocate &
page-lock a contiguous range of physical memory, and then map that memory
into the requesting process's working set. That way we avoided all copies
and didn't require any fancy DMA hardware. We also supported DMA directly
into display memory via a Direct-Draw surface and hardware support for
generating the correct pixel layout in our PCI card.

-cd
 
Back
Top