D
Dave Littell
Greetings,
(Sorry for the massive crossposts, but I need a real answer real
soon, so I'm shotgunning.)
I don't know if this is the right newsgroup for this question,
(there's only about 10,000 or so... :-0), but here goes:
OS: Windows 2000 Professional
Service Pack level: SP3
Problem Summary: Single writer/multiple reader shared memory fails
to maintain coherency, but (usually) recovers.
I have a situation where the writer updates a small area (less than
a physical page size) of shared memory at a relatively high rate
(greater than 50 Hz, less than 1 kHz (and no, I can't be more
specific)). There are multiple readers that are held at bay by a
named event (manual reset, controlled by the writer, initially
reset) during the writer's update and released via the writer's
SetEvent() at the end of the update. The named event remains
signaled until the next update. To protect against the case where a
reader can wake up slightly before the update (due to out-of-order
delivery of trigger events), proceed because the named event is
still signaled, and retrieve stale data I use a sequence number in
the shared memory that is only ever written by the shared memory
writer (during its update). Each reader keeps a local idea of the
expected sequence number at the next update and polls for that value
to show up in the shared memory sequence number before proceeding.
The readers' polling is gated by the named event so they can't jump
in during the writer's update.
So, the shared memory sequence number has the value 0 *only during
the time before the writer's first update*. During this
initialization period the named event is nonsignaled, so the readers
will all block on the named event and never see the 0.
Here's the problem: After they all run merrily for a while I
suddenly see cases where the readers are getting 0-valued sequence
numbers. This is just not possible after the writer cycles through
its first update. Sometimes a reader finally gets the correct
(non-zero) sequence number and continues along. Occasionally a
reader gets a 0-valued sequence number forever (or at least until I
kill it).
The shared-memory sequence number was initially a 64-bit value
(32-bit machine). Since 64-bit writes to shared memory aren't
guaranteed to be atomic, I changed it to a 32-bit value (for which
shared-memory writes are atomic). Same behavior. I'm running out
of guesses, so...
This sounds like some color of a shared-memory coherency problem to
me (not to mix metaphors or anything ;-). Yah?
I believe I'm seeing some correlation between these 0-valued
sequence numbers and bursts of disk activity while the kids are
running. I've been able to pretty consistently get the problem to
appear if I do something that touches the disk. Note that the
shared memory was set up using:
CreateFileMapping( INVALID_HANDLE_VALUE,
NULL,
PAGE_READWRITE | SEC_COMMIT, ... );
and
MapViewOfFile( ..., FILE_MAP_ALL_ACCESS, ... );
So, my understanding is that all this tells 200 there's just a page
of physical memory somewhere that the readers and writer can all
see. I believe that the physical page that is the shared memory
shouldn't ever be evicted to the paging file because of a call to
VirtualLock(), so it should never even come close to the the disk.
I'm stumped: any ideas?
Thanks very much,
Dave
(Sorry for the massive crossposts, but I need a real answer real
soon, so I'm shotgunning.)
I don't know if this is the right newsgroup for this question,
(there's only about 10,000 or so... :-0), but here goes:
OS: Windows 2000 Professional
Service Pack level: SP3
Problem Summary: Single writer/multiple reader shared memory fails
to maintain coherency, but (usually) recovers.
I have a situation where the writer updates a small area (less than
a physical page size) of shared memory at a relatively high rate
(greater than 50 Hz, less than 1 kHz (and no, I can't be more
specific)). There are multiple readers that are held at bay by a
named event (manual reset, controlled by the writer, initially
reset) during the writer's update and released via the writer's
SetEvent() at the end of the update. The named event remains
signaled until the next update. To protect against the case where a
reader can wake up slightly before the update (due to out-of-order
delivery of trigger events), proceed because the named event is
still signaled, and retrieve stale data I use a sequence number in
the shared memory that is only ever written by the shared memory
writer (during its update). Each reader keeps a local idea of the
expected sequence number at the next update and polls for that value
to show up in the shared memory sequence number before proceeding.
The readers' polling is gated by the named event so they can't jump
in during the writer's update.
So, the shared memory sequence number has the value 0 *only during
the time before the writer's first update*. During this
initialization period the named event is nonsignaled, so the readers
will all block on the named event and never see the 0.
Here's the problem: After they all run merrily for a while I
suddenly see cases where the readers are getting 0-valued sequence
numbers. This is just not possible after the writer cycles through
its first update. Sometimes a reader finally gets the correct
(non-zero) sequence number and continues along. Occasionally a
reader gets a 0-valued sequence number forever (or at least until I
kill it).
The shared-memory sequence number was initially a 64-bit value
(32-bit machine). Since 64-bit writes to shared memory aren't
guaranteed to be atomic, I changed it to a 32-bit value (for which
shared-memory writes are atomic). Same behavior. I'm running out
of guesses, so...
This sounds like some color of a shared-memory coherency problem to
me (not to mix metaphors or anything ;-). Yah?
I believe I'm seeing some correlation between these 0-valued
sequence numbers and bursts of disk activity while the kids are
running. I've been able to pretty consistently get the problem to
appear if I do something that touches the disk. Note that the
shared memory was set up using:
CreateFileMapping( INVALID_HANDLE_VALUE,
NULL,
PAGE_READWRITE | SEC_COMMIT, ... );
and
MapViewOfFile( ..., FILE_MAP_ALL_ACCESS, ... );
So, my understanding is that all this tells 200 there's just a page
of physical memory somewhere that the readers and writer can all
see. I believe that the physical page that is the shared memory
shouldn't ever be evicted to the paging file because of a call to
VirtualLock(), so it should never even come close to the the disk.
I'm stumped: any ideas?
Thanks very much,
Dave