The Interlocked on the Edge of Forever

  • Thread starter Thread starter Chris Mullins [MVP]
  • Start date Start date
C

Chris Mullins [MVP]

I've got a quick question that's been bugging me for a long, long time:

Let's say I've got a member variable (in a heavily threaded app):
private int _firstTime = 0;

To make changes to this is easy enough:
InterlockedExchange(ref _firstTime, 1);

.... or I can use the InterlockedCompareExchange:
int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0);

No problems here.

.... but, if I just want to read the value, I CANNOT do:

int myValue = _firstTime;
or
if (_fistTime==0)

Doing these requires a memory barrier of some type (volatile variable,
monitor, etc) to have any degree of reliability.

Now, reading the value out using InterlockedCompareExchange always seemed a
bit silly to me. I don't want to always write:

int myValue = InterlockedCompareExchange(ref _firstTime, Int32.MinValue,
Int32.MinValue);

I've never found (although not for lack of looking):
int myValue = InterlockedRead(ref _firstTime);

Of the code that I write, reading the value, and branching based on it's
value, is much more common than writing to the value.

I have, in the past, used volitile variables for this - but I've stopped due
to the various issues regarding volatiles (non atomic reads & writes).

Is there a concensus on the best way to do this? It seems like all the
solutions are.... incomplete.

Essentially I'm looking for a volitile varaiable that is atomically read and
written. Unfortuantly, these don't exist in any language I've yet used.
 
Essentially I'm looking for a volitile varaiable that is atomically
read and written. Unfortuantly, these don't exist in any language
I've yet used.

I'm not sure what you're after that a simple volatile *wouldn't* give
you here, given that *any* Int32 variable will be accessed atomically,
whether it's volatile or not. You can't do atomic increments without
something like Interlocked, but simple reads and writes are atomic.
 
[...]
Of the code that I write, reading the value, and branching based on it's
value, is much more common than writing to the value.

I have, in the past, used volitile variables for this - but I've stopped
due to the various issues regarding volatiles (non atomic reads &
writes).

Jon's reply matches what I believed to be true. Can you explain why it is
you believe that a non-atomic read or write is possible for a 32-bit
value? If that's true, it's a pretty significant error in the way I've
approached multi-threaded programming. Which is not out of the question
by any means, but if that's the case I really want to know about it and
know why it's the case.

Pete
 
Chris said:
I've got a quick question that's been bugging me for a long, long time:

Let's say I've got a member variable (in a heavily threaded app):
private int _firstTime = 0;
Doing these requires a memory barrier of some type (volatile variable,
monitor, etc) to have any degree of reliability.
Yes.

I've never found (although not for lack of looking):
int myValue = InterlockedRead(ref _firstTime);

Perhaps Thread.VolatileRead() is what you're looking for?
I have, in the past, used volitile variables for this - but I've stopped due
to the various issues regarding volatiles (non atomic reads & writes).

Can you expand on your objection to volatile variables?
Is there a concensus on the best way to do this? It seems like all the
solutions are.... incomplete.

Essentially I'm looking for a volitile varaiable that is atomically read and
written. Unfortuantly, these don't exist in any language I've yet used.

I'm not aware of any circumstances where normally aligned volatile
variables of the machine word size or smaller are read or written
non-atomically on the CLR. Can you explain more?

-- Barry
 
Chris Mullins said:
I've got a quick question that's been bugging me for a long, long time:

Let's say I've got a member variable (in a heavily threaded app):
private int _firstTime = 0;

To make changes to this is easy enough:
InterlockedExchange(ref _firstTime, 1);

... or I can use the InterlockedCompareExchange:
int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0);

No problems here.

... but, if I just want to read the value, I CANNOT do:

int myValue = _firstTime;
or
if (_fistTime==0)
Why?

Doing these requires a memory barrier of some type (volatile variable,
monitor, etc) to have any degree of reliability.
I don't see the need for a memory barrier, unless you need memory
ordering guarantees. Volatile access would make sense since you really want
to read from memory. However, this only affects how the Jitter generates
code.
Now, reading the value out using InterlockedCompareExchange always seemed
a bit silly to me. I don't want to always write:

int myValue = InterlockedCompareExchange(ref _firstTime, Int32.MinValue,
Int32.MinValue);

This looks quite pointless. If you don't need memory ordering guarantees a
volatile read is good enough. I don't find the spec now, but I suspect
InterlockedCompareExchange has probably both acquire and release
semantics.
I've never found (although not for lack of looking):
int myValue = InterlockedRead(ref _firstTime);
This is just a memory fence plus a volatile read. And it's not clear that
you
actually need the former.
Of the code that I write, reading the value, and branching based on it's
value, is much more common than writing to the value.
Again volatile read is what you want.
I have, in the past, used volitile variables for this - but I've stopped
due to the various issues regarding volatiles (non atomic reads & writes).
What makes you believe that either reads or writes are non-atomic?
They certainly are at CLR instruction level.
Is there a concensus on the best way to do this? It seems like all the
solutions are.... incomplete.

For a one-time init thing you will probably need atomic compare
exchange for acquiring the construction lock and release semantics when
signaling successful construction.

If you need to wait for construction you'd probably have three states:
- uninitialized
- under_construction
- constructed

Checking whether the shared object is constructed is done by a volatile
read from the guard variable. If the result is "constructed" you're done.
Just a few cycles - maybe satisified from cache.

If not, use atomic compare exchange to acquire the "under-construction"
lock. If an other thread was faster wait. If not construct object and
use store with release semantics to signal object constructed.

I'm not sure if there is a really efficient Wait method (in this particular
context) in the CLR.

Anyway, the standard case (object already constructed) is very fast.
Essentially I'm looking for a volitile varaiable that is atomically read
and written. Unfortuantly, these don't exist in any language I've yet
used.
I think the CLI standard requires that to a certain extent. That said, the
text is vague. Bottom line I think you can rely on reads or writes being
volatile, but not more complex operations that both read and write,
e.g.:

volatile int i;
do_something_with(i); // ok
i = some_value(); // ok
i++; // bad no guarantees here

But one might argue that this is really a language thing.

-hg
 
| To make changes to this is easy enough:
| InterlockedExchange(ref _firstTime, 1);
|
| ... or I can use the InterlockedCompareExchange:
| int oldValue = InterlockedCompareExchange(ref _firstTime, 1, 0);
|
| No problems here.
|
| ... but, if I just want to read the value, I CANNOT do:
|
| int myValue = _firstTime;
| or
| if (_fistTime==0)
|
| Doing these requires a memory barrier of some type (volatile variable,
| monitor, etc) to have any degree of reliability.

you can. The Interlocked write ensures this across all cpus and cache. Any
thread reading the var after the Interlocked write will see the correct var
so it is a memory barrier without a volatile modifier. All locks are
ultimatley built on interlocked at the lowest level to get this behavior.
So for reads, you can just go "if (var == 2){}". Naturally, this works for
a single var. If you have multiple invariants you need to protect, you
probabaly need a lock.
 
William Stacey said:
| Doing these requires a memory barrier of some type (volatile variable,
| monitor, etc) to have any degree of reliability.

you can. The Interlocked write ensures this across all cpus and cache. Any
thread reading the var after the Interlocked write will see the correct var
so it is a memory barrier without a volatile modifier.

But surely that's only guaranteed if the read *actually* happens after
the Interlocked write occurs. If the variable isn't volatile, there's
nothing to stop the reading thread from having reordered the reads. For
instance:

int x=0, y=0;

....

Thread 1:
Interlocked.Increment (ref x, 1);
Interlocked.Increment (ref y, 1);



Thread 2:
int a = x;
int b = y;

The JIT *could* reorder the reads of x and y in thread 2, such that you
get a=0, b=1, despite the fact that x is incremented before y. That
wouldn't be the case with volatile variables.


At least, that's how I understand it.
 
| But surely that's only guaranteed if the read *actually* happens after
| the Interlocked write occurs. If the variable isn't volatile, there's
| nothing to stop the reading thread from having reordered the reads. For
| instance:

Hi Jon. Not sure I totally follow you here. If Interlocked x happens, a
reader will never see x==0 (until it wraps). So a read on x or y is volatile
using an interlocked. Naturally, in either method, a reader could read x and
y after x was incremented, but before y was incremented because there is no
lock here. But this is a different problem (i.e race). If we need both vars
to be atomic in respect to each other and handled as an invariant "set",
then we need a lock of some type protecting the set. As I said, maybe I am
not following your intent and will slap my head.
--wjs
 
William Stacey said:
| But surely that's only guaranteed if the read *actually* happens after
| the Interlocked write occurs. If the variable isn't volatile, there's
| nothing to stop the reading thread from having reordered the reads. For
| instance:

Hi Jon. Not sure I totally follow you here. If Interlocked x happens, a
reader will never see x==0 (until it wraps). So a read on x or y is volatile
using an interlocked. Naturally, in either method, a reader could read x and
y after x was incremented, but before y was incremented because there is no
lock here.

I still believe that it can read y before either of them are
incremented, and x after both of them are incremented. There's nothing
to stop the JIT from reordering the reads to read y before x, because
they're not volatile. I wouldn't be surprised to learn that the
interlocked operation makes sure that when it *does* try to read x and
y, those values are really the latest ones, but I don't think there's
any guarantee about the order of those reads unless at least one of
them is volatile. (It's nearly midnight, so I can't be bothered to work
out which it should be - y, I think.)

I definitely *wasn't* talking about the atomicity of the two operations
- just the ordering.
 
So are you saying the JIT can reorder these two lines:
int a = x;
int b = y;

Like so:
int b = y;
int a = x;

? tia

--
William Stacey [C# MVP]



| > | But surely that's only guaranteed if the read *actually* happens after
| > | the Interlocked write occurs. If the variable isn't volatile, there's
| > | nothing to stop the reading thread from having reordered the reads.
For
| > | instance:
| >
| > Hi Jon. Not sure I totally follow you here. If Interlocked x happens, a
| > reader will never see x==0 (until it wraps). So a read on x or y is
volatile
| > using an interlocked. Naturally, in either method, a reader could read x
and
| > y after x was incremented, but before y was incremented because there is
no
| > lock here.
|
| I still believe that it can read y before either of them are
| incremented, and x after both of them are incremented. There's nothing
| to stop the JIT from reordering the reads to read y before x, because
| they're not volatile. I wouldn't be surprised to learn that the
| interlocked operation makes sure that when it *does* try to read x and
| y, those values are really the latest ones, but I don't think there's
| any guarantee about the order of those reads unless at least one of
| them is volatile. (It's nearly midnight, so I can't be bothered to work
| out which it should be - y, I think.)
|
| I definitely *wasn't* talking about the atomicity of the two operations
| - just the ordering.
|
| --
| Jon Skeet - <[email protected]>
| http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
| If replying to the group, please do not mail me too
 
William said:
So are you saying the JIT can reorder these two lines:
int a = x;
int b = y;

Like so:
int b = y;
int a = x;

Yes, if x and y are not volatile locations. The language of the spec is
that reads and writes to volatile locations are the only observable side
effects of memory access; memory accesses may be arbitrarily reordered
if they aren't volatile. For example, if the value of y has been loaded
to a register because it was used somewhere before the line 'int a =
x;', then the value of y used in the line 'int b = y;' may predate the
actual in-memory value of x, because the value may be used from the
register, rather than freshly loading it from memory. So, volatile can
inhibit these kinds of optimizations (avoiding redundant loads).

There's more to it than that, volatile reads have acquire semantics,
volatile writes release semantics. That has stronger guarantees than
just inhibiting optimization, as it affects e.g. write buffer reordering
in some processors.

-- Barry
 
William Stacey said:
So are you saying the JIT can reorder these two lines:
int a = x;
int b = y;

Like so:
int b = y;
int a = x;

Exactly, so long as there aren't any memory barriers in the way. Memory
barriers (which include operations with volatile locations) stop the
JIT from moving memory operations beyond the barrier.

It's unfortunate that the CLI spec isn't nearly as clear as it might be
about all this, but Vance Morrison wrote a great MSDN article about the
..NET 2.0 model (which is stricter than the CLI spec):
http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/
 
Here is some more food for the party:
http://msdn2.microsoft.com/en-us/library/ms686355.aspx

Here he states the compiler will not re-order volatile variable access,
However these operations still *could be re-ordered by the processor. In
fact, he shows correcting this issue with an Interlocked operation. This
article also state that Interlocked functions *ensure appropriat barriers
for memory ordering. So (at least for this example) it would seem
Interlocked provides as good or better symatics as the volatile example (and
also prevents CPU re-order). However, I am not sure any this matters for
this example, as in both cases you still have the race issue. R1 can read
x, then W1 writes x and y, then R1 reads y - or various combinations of
same. So some form of critical section around both x and y would seem to be
the proper behavior, unless we don't care about x or y in respect to each
other (i.e. simple counters that are not treated as a set).

--
William Stacey [C# MVP]
PCR concurrency library: www.codeplex.com/pcr
PSH Scripts Project www.codeplex.com/psobject


| > So are you saying the JIT can reorder these two lines:
| > int a = x;
| > int b = y;
| >
| > Like so:
| > int b = y;
| > int a = x;
|
| Exactly, so long as there aren't any memory barriers in the way. Memory
| barriers (which include operations with volatile locations) stop the
| JIT from moving memory operations beyond the barrier.
|
| It's unfortunate that the CLI spec isn't nearly as clear as it might be
| about all this, but Vance Morrison wrote a great MSDN article about the
| .NET 2.0 model (which is stricter than the CLI spec):
| http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/
|
| --
| Jon Skeet - <[email protected]>
| http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
| If replying to the group, please do not mail me too
 
William said:
Here is some more food for the party:
http://msdn2.microsoft.com/en-us/library/ms686355.aspx

Here he states the compiler will not re-order volatile variable access,
However these operations still *could be re-ordered by the processor.

Volatile read has acquire semantics. Loads won't move before an acquire,
similarly, writes won't move after a release. These should be
implemented by CPU instructions, as explained in the article you linked
to, and aren't restricted to just compiler optimizations. This means
that volatile read on .NET is sufficient, because the CPU won't reorder
such that the second load moves before the first load, because of the
acquire semantics (aka read fence).
In
fact, he shows correcting this issue with an Interlocked operation. This
article also state that Interlocked functions *ensure appropriat barriers
for memory ordering.

The have full fence semantics (except the Interlocked*Acquire and
Interlocked*Release).
So (at least for this example) it would seem
Interlocked provides as good or better symatics as the volatile example (and
also prevents CPU re-order).

Volatile also prevents CPU re-order, but asymmetrically: read causes
acquire, write causes release. The write side release means that e.g.
"publishing" a new instance to a static volatile variable won't have a
problem such that some fields in the instance aren't fully retired
before other threads could read the instance reference - writes won't
move ahead of the release / write fence.

-- Barry
 
Barry Kelly said:
There's more to it than that, volatile reads have acquire semantics,
volatile writes release semantics. That has stronger guarantees than
just inhibiting optimization, as it affects e.g. write buffer reordering
in some processors.

I see that in CLI spec, but have you actually verified it? It is not
what I would expect and not what I have seen.

Oddly, I see that the Jitter even removes dead loads altogher.

Anyway, given the rationale in the CLI spec (hardware
register access), I don't see how this could possibly work in
a reasonably cheap way for the P4 memory model.

-hg
 
William Stacey said:
Here is some more food for the party:
http://msdn2.microsoft.com/en-us/library/ms686355.aspx

Here he states the compiler will not re-order volatile variable access,
However these operations still *could be re-ordered by the processor.

Hmm. That sounds like it's a broken implementation then. Basically, if
the .NET CLR executes code in a way which violates the spec, the
implementation is flawed. If the processor could reorder things, the
CLR should make sure that that is invisible to the user, beyond what is
possible within the CLI spec.

I'd be interested to hear Joe Duffy's opinion on that part of the
article.
In fact, he shows correcting this issue with an Interlocked operation. This
article also state that Interlocked functions *ensure appropriat barriers
for memory ordering. So (at least for this example) it would seem
Interlocked provides as good or better symatics as the volatile example (and
also prevents CPU re-order).

No - because only the thread which *calls* the Interlocked method knows
that interlocked is involved, whereas with volatile both the reading
thread *and* the writing thread know to use memory barriers. Of course,
if you use Interlocked in both threads, to both read *and* write the
value, then everything will be okay.
However, I am not sure any this matters for
this example, as in both cases you still have the race issue. R1 can read
x, then W1 writes x and y, then R1 reads y - or various combinations of
same. So some form of critical section around both x and y would seem to be
the proper behavior, unless we don't care about x or y in respect to each
other (i.e. simple counters that are not treated as a set).

If we care about atomicity, there needs to be some kind of locking.
If we only care that the change to y is seen after the change to x
(i.e. you can't see x=0, y=1) then volatile will do the job but I don't
believe that changing the variable with Interlocked and then reading it
directly in another thread is guaranteed to work.
 
| Hmm. That sounds like it's a broken implementation then. Basically, if
| the .NET CLR executes code in a way which violates the spec, the
| implementation is flawed. If the processor could reorder things, the
| CLR should make sure that that is invisible to the user, beyond what is
| possible within the CLI spec.
|
| I'd be interested to hear Joe Duffy's opinion on that part of the
| article.

I would like to see Joe comment on that as well. If that article is
correct, it sounds like "volatile" may not work as people expect it to be
working.

| No - because only the thread which *calls* the Interlocked method knows
| that interlocked is involved, whereas with volatile both the reading
| thread *and* the writing thread know to use memory barriers. Of course,
| if you use Interlocked in both threads, to both read *and* write the
| value, then everything will be okay.

The reader does not need to know as it gets the read fence automatically (at
the hardware layer) because the Interlocked buts up a *full memory barrier
fence so reads can not be reordered at the instruction level. vista has
some other versions with seperate read aquire/write aquire methods, but the
ones in the framework use the full fence. I am not sure how volatile is
implemented under the covers, but it would not suprise me if it also uses an
interlocked operation.

| If we care about atomicity, there needs to be some kind of locking.
| If we only care that the change to y is seen after the change to x
| (i.e. you can't see x=0, y=1) then volatile will do the job but I don't
| believe that changing the variable with Interlocked and then reading it
| directly in another thread is guaranteed to work.

hmm. We have seem to cover the same ground. Everything I have seen (more
then that link) says it does. I would be interested if you find something
to contrary as that would be good info. Also, most (if not all) locks are
based on Interlocked in their guts, so if it does not work, then everything
in the history of the world is broke. Ok, that is bit drastic... tia
 
Thanks Barry. My assertion has been that Interlocked works here, not that
volatile does not. One thing outstanding, however, is that little "blurb"
about the cpu being able to reordering reads even with volatile. That is a
bit unsettling.

--
William Stacey [C# MVP]


| William Stacey [C# MVP] wrote:
|
| > Here is some more food for the party:
| > http://msdn2.microsoft.com/en-us/library/ms686355.aspx
| >
| > Here he states the compiler will not re-order volatile variable access,
| > However these operations still *could be re-ordered by the processor.
|
| Volatile read has acquire semantics. Loads won't move before an acquire,
| similarly, writes won't move after a release. These should be
| implemented by CPU instructions, as explained in the article you linked
| to, and aren't restricted to just compiler optimizations. This means
| that volatile read on .NET is sufficient, because the CPU won't reorder
| such that the second load moves before the first load, because of the
| acquire semantics (aka read fence).
|
| > In
| > fact, he shows correcting this issue with an Interlocked operation.
This
| > article also state that Interlocked functions *ensure appropriat
barriers
| > for memory ordering.
|
| The have full fence semantics (except the Interlocked*Acquire and
| Interlocked*Release).
|
| > So (at least for this example) it would seem
| > Interlocked provides as good or better symatics as the volatile example
(and
| > also prevents CPU re-order).
|
| Volatile also prevents CPU re-order, but asymmetrically: read causes
| acquire, write causes release. The write side release means that e.g.
| "publishing" a new instance to a static volatile variable won't have a
| problem such that some fields in the instance aren't fully retired
| before other threads could read the instance reference - writes won't
| move ahead of the release / write fence.
|
| -- Barry
|
| --
| http://barrkel.blogspot.com/
 
After re-reading, I think I found it. He was talking about VS2005
volatile -vs- VS2003 volatile - they seem to have implemented them better in
2005 using proper acquire and release semantics. That is info I never saw
before. The volatile code he shows works correctly with vs2005 compiler,
but may not with 2003 compiler, hence the alternative Interlocked example
given. Makes more sense now.

--
William Stacey [C# MVP]


| Thanks Barry. My assertion has been that Interlocked works here, not that
| volatile does not. One thing outstanding, however, is that little "blurb"
| about the cpu being able to reordering reads even with volatile. That is
a
| bit unsettling.
|
| --
| William Stacey [C# MVP]
|
|
| || William Stacey [C# MVP] wrote:
||
|| > Here is some more food for the party:
|| > http://msdn2.microsoft.com/en-us/library/ms686355.aspx
|| >
|| > Here he states the compiler will not re-order volatile variable access,
|| > However these operations still *could be re-ordered by the processor.
||
|| Volatile read has acquire semantics. Loads won't move before an acquire,
|| similarly, writes won't move after a release. These should be
|| implemented by CPU instructions, as explained in the article you linked
|| to, and aren't restricted to just compiler optimizations. This means
|| that volatile read on .NET is sufficient, because the CPU won't reorder
|| such that the second load moves before the first load, because of the
|| acquire semantics (aka read fence).
||
|| > In
|| > fact, he shows correcting this issue with an Interlocked operation.
| This
|| > article also state that Interlocked functions *ensure appropriat
| barriers
|| > for memory ordering.
||
|| The have full fence semantics (except the Interlocked*Acquire and
|| Interlocked*Release).
||
|| > So (at least for this example) it would seem
|| > Interlocked provides as good or better symatics as the volatile example
| (and
|| > also prevents CPU re-order).
||
|| Volatile also prevents CPU re-order, but asymmetrically: read causes
|| acquire, write causes release. The write side release means that e.g.
|| "publishing" a new instance to a static volatile variable won't have a
|| problem such that some fields in the instance aren't fully retired
|| before other threads could read the instance reference - writes won't
|| move ahead of the release / write fence.
||
|| -- Barry
||
|| --
|| http://barrkel.blogspot.com/
|
|
 
|| I'd be interested to hear Joe Duffy's opinion on that part of the
|| article.

I posted correction in thread above. It seems the issue with volatile was
with vs2003, not vs2005.

--wjs
 
Back
Top