C#, Threads, Events, and DataGrids/DataSets

  • Thread starter Thread starter Guest
  • Start date Start date
Volatile has meanings both for compiler and runtime. Runtime meaning is
release/acquire, while as compiler's meaning "don't drop load to register
out of cycle body" menaing. Whatever optimizing compiler is, it would never
try optimization that is proven to be unacceptable by compiler optimization
theory. Non-inlined method call makes cycle body to be considered
non-trivial because it adds too many factors that prohibits making
assumption that variable isn't modified by its location in the cycle body
(including but not limited to possibility of runtime weaveing some sort of
call processing).
When I said that I can give prove for both x86 and .Net memory models I
meant I can prove it separately for each of them (+ for Itanium and Athlon
64 memory models too btw).
The only place where using instance field on the class that we are
discussing could present a problem is during first access to 'this' pointer
after constructing new class instace. But at this point instance isn't
shared and this is never a problem for single thread. At the pointg when
thread runs (Thread.Start()), there is guarantee that there were quite a few
memory barriers in the middle (when OS starts thread it will be a lot of
LOCKs and any LOCK means complete memory barrier with processor caches being
syncronized). So, all processor caches are guaranteed to be synchronized for
'this' and stopProcessing. After that "write - to - read" order doesn't
matter for that usage of stopProcessing sinced its assignment/read from
memory location will be atomic (bool is promoted to native integer and
aligned to the native integer boundary). And non-trivial cycle body assures
that JIT would never drop loading field from memory location to the register
out of the cycle body... I can even draw Petri Nets diagram with prove of
this...

-Valery.

See my blog at:
http://www.harper.no/valery
 
Valery Pryamikov said:
Volatile has meanings both for compiler and runtime. Runtime meaning is
release/acquire, while as compiler's meaning "don't drop load to register
out of cycle body" menaing.

Which compiler are you talking about here? The C# compiler doesn't deal
with registers at all. The only compiler which really deals with
registers is the JIT compiler, and in that sense it *is* the runtime,
in that after the JIT has worked its magic, it's really just x86 (or
whatever) code.

Where in either the C# or CLI specification does it say anything about
"don't drop load to register out of cycle body"? That's the part I
haven't seen.
Whatever optimizing compiler is, it would never
try optimization that is proven to be unacceptable by compiler optimization
theory.

And where is that explicitly guaranteed in the specification?
Non-inlined method call makes cycle body to be considered
non-trivial because it adds too many factors that prohibits making
assumption that variable isn't modified by its location in the cycle body
(including but not limited to possibility of runtime weaveing some sort of
call processing).
When I said that I can give prove for both x86 and .Net memory models I
meant I can prove it separately for each of them (+ for Itanium and Athlon
64 memory models too btw).

Good - I'm only interested in the .NET memory model though, so don't
worry about the other ones unless you wish to for posterity.
The only place where using instance field on the class that we are
discussing could present a problem is during first access to 'this' pointer
after constructing new class instace. But at this point instance isn't
shared and this is never a problem for single thread. At the pointg when
thread runs (Thread.Start()), there is guarantee that there were quite a few
memory barriers in the middle (when OS starts thread it will be a lot of
LOCKs and any LOCK means complete memory barrier with processor caches being
syncronized). So, all processor caches are guaranteed to be synchronized for
'this' and stopProcessing. After that "write - to - read" order doesn't
matter for that usage of stopProcessing sinced its assignment/read from
memory location will be atomic (bool is promoted to native integer and
aligned to the native integer boundary). And non-trivial cycle body assures
that JIT would never drop loading field from memory location to the register
out of the cycle body... I can even draw Petri Nets diagram with prove of
this...

Where in the CLI specification does it say that the JIT would never
drop loading field from memory location to the register out of the
cycle body? (Or other local cache memory - it doesn't have to be a
register.) It may well be accepted wisdom in other areas that that
doesn't happen, but I don't see where that's guaranteed.

Chris Brumme's blog is interesting on this topic. He gives the same
kind of idea of what an extremely weak memory model is:

<quote>
At the other extreme, we have a world where CPUs operate almost
entirely out of private cache. If another CPU ever sees anything my
CPU is doing, it=3Fs a total accident of timing.
</quote>

He also writes:

<quote>
In my opinion, we screwed up when we specified the ECMA memory model.
That model is unreasonable because:
* All stores to shared memory really require a volatile prefix.
[...]
</quote>

I disagree with him in terms of how hard it is to write code to this
model though - basically, if you make *all* access to data available to
multiple threads locked (with access to any single item of data only
available through the same lock) then you'll be safe. Now, the problem
there is in terms of performance - but for *most* people, I don't
believe the overhead of that kind of locking is going to be
significant. For some people doing incredibly fiddly stuff, using locks
may be overkill and memory barriers would be preferable - but they're
harder to work with (IMO) and so I recommend the "safe but slightly
slower" approach usually.
 
<snip>

I've just emailed Vance Morrison at Microsoft about this - he's helped
me out with a previous memory model question. Without external expert
help, I suspect we're not going to make any progress here. Valery -
drop me a line if you want a copy of the email.
 
This will work on any memory model listed in
ftp://gatekeeper.dec.com/pub/DEC/WRL/research-reports/WRL-TR-95.7.pdf.
Optimizing compilers theory has a lots of discussons about acceptable types
of optimization.
You can find some really good paper about it on
http://citeseer.nj.nec.com/cs.
Rearragnig memory access by optimizing compilers is somethign that was
beaten to death and it does apply to all existing optimizing compilers
(including Just In Time compilers).

I don't have any more time to spend on this conversation, sorry.
-Valery.

See my blog at:
http://www.harper.no/valery


Jon Skeet said:
Valery Pryamikov said:
Volatile has meanings both for compiler and runtime. Runtime meaning is
release/acquire, while as compiler's meaning "don't drop load to register
out of cycle body" menaing.

Which compiler are you talking about here? The C# compiler doesn't deal
with registers at all. The only compiler which really deals with
registers is the JIT compiler, and in that sense it *is* the runtime,
in that after the JIT has worked its magic, it's really just x86 (or
whatever) code.

Where in either the C# or CLI specification does it say anything about
"don't drop load to register out of cycle body"? That's the part I
haven't seen.
Whatever optimizing compiler is, it would never
try optimization that is proven to be unacceptable by compiler optimization
theory.

And where is that explicitly guaranteed in the specification?
Non-inlined method call makes cycle body to be considered
non-trivial because it adds too many factors that prohibits making
assumption that variable isn't modified by its location in the cycle body
(including but not limited to possibility of runtime weaveing some sort of
call processing).
When I said that I can give prove for both x86 and .Net memory models I
meant I can prove it separately for each of them (+ for Itanium and Athlon
64 memory models too btw).

Good - I'm only interested in the .NET memory model though, so don't
worry about the other ones unless you wish to for posterity.
The only place where using instance field on the class that we are
discussing could present a problem is during first access to 'this' pointer
after constructing new class instace. But at this point instance isn't
shared and this is never a problem for single thread. At the pointg when
thread runs (Thread.Start()), there is guarantee that there were quite a few
memory barriers in the middle (when OS starts thread it will be a lot of
LOCKs and any LOCK means complete memory barrier with processor caches being
syncronized). So, all processor caches are guaranteed to be synchronized for
'this' and stopProcessing. After that "write - to - read" order doesn't
matter for that usage of stopProcessing sinced its assignment/read from
memory location will be atomic (bool is promoted to native integer and
aligned to the native integer boundary). And non-trivial cycle body assures
that JIT would never drop loading field from memory location to the register
out of the cycle body... I can even draw Petri Nets diagram with prove of
this...

Where in the CLI specification does it say that the JIT would never
drop loading field from memory location to the register out of the
cycle body? (Or other local cache memory - it doesn't have to be a
register.) It may well be accepted wisdom in other areas that that
doesn't happen, but I don't see where that's guaranteed.

Chris Brumme's blog is interesting on this topic. He gives the same
kind of idea of what an extremely weak memory model is:

<quote>
At the other extreme, we have a world where CPUs operate almost
entirely out of private cache. If another CPU ever sees anything my
CPU is doing, it=3Fs a total accident of timing.
</quote>

He also writes:

<quote>
In my opinion, we screwed up when we specified the ECMA memory model.
That model is unreasonable because:
* All stores to shared memory really require a volatile prefix.
[...]
</quote>

I disagree with him in terms of how hard it is to write code to this
model though - basically, if you make *all* access to data available to
multiple threads locked (with access to any single item of data only
available through the same lock) then you'll be safe. Now, the problem
there is in terms of performance - but for *most* people, I don't
believe the overhead of that kind of locking is going to be
significant. For some people doing incredibly fiddly stuff, using locks
may be overkill and memory barriers would be preferable - but they're
harder to work with (IMO) and so I recommend the "safe but slightly
slower" approach usually.
 
Jon Skeet said:
I've just emailed Vance Morrison at Microsoft about this - he's helped
me out with a previous memory model question. Without external expert
help, I suspect we're not going to make any progress here. Valery -
drop me a line if you want a copy of the email.

Vance has replied with a a great response. It's here below. I haven't
included the emails I sent to Vance to start with, which are somewhat
related (in particular when he talks about "point 2" of Valery's
argument, which refers to what kind of situation the JIT compiler can
cache values) but I don't *think* it's crucial. If Valery disagrees he
can certainly post the mails - no problem as far as I'm concerned, but
I don't like posting other people's words without getting their consent
first, and I suspect getting Valery's sick of emails from me today :)

Here's Vance's reply (with a couple of typos cleaned up):


Jon asked me to weigh in on the memory model issue below.

First, I agree with the argument that Jon attributes to Valery below
(given points (1) and (2) it follows that any rational platform the
code below will 'work'. I say 'rational' because technically speaking
there is enough wiggle room in the spec to cause grief. Note that the
spec does not say anything about how long it takes for one processor to
see the writes of another. Thus if one processor wrote to 'stopRunning'
and then spun, there is nothing in the spec that forces the write to
ever be flushed to main memory and thus be seen by a thread running on
another processor. Thus you can in theory get a deadlock. This is
clearly a corner case, but I think I was called in as a spec lawyer, so
I am being picky.

More seriously, however, is the issue that assumption (2) below (that
the body is 'non-trivial' and thus compilers are not allowed to cache
'stopRunning') is not really true from a spec perspective. The runtime
is allowed at any point to treat built in functions like
'Console.WriteLine' as intrinsic (that is the runtime owns the
implementation, so the JIT compiler can know special things about it).
Thus you can imagine a compiler that knows that 'Console.WriteLine does
not modify any visible global variables, and thus 'knows' that
stopRunning can be safely enregistered. Of course this is not true in
practice, but could be true if 'Console.WriteLine, was instead say
Math.Sin(). Inlining also cause the same effect (if we inlined
'Console.Writeline, and ToString(), to the point that there are no
function calls in ANY path through the loop, then it is possible for
there to be a spin lock.

Even more seriously, however, is that we don't want spec quibbling to
get in the way of doing the right thing. Variables that are shared
across threads without additional synchronization need to be accesses
as volatile variables. (either declaring them volatile or using the
System.Threading.Thread.Volatile* methods) Why is this the right thing
to do?

1) Doing so EXACTLY describes the intention of the program (that a
memory cell will be access cross thread without synchronization). It is
a big red flag to both the compiler and more importantly people reading
the code that something cross-thread is happening here (and in
particular the loop is not infinite).

2) Because we have declared our intention to the world correctly, the
world can 'play nice' with our code. We don't need to have long
discussions about the subtleties of memory model and cache coherence.
We can live in a much simpler world where code is not surprising.

3) Note that the assumptions built into the analysis above relied on
details that are fragile. If 'Console.WriteLine' were pulled out, the
program becomes incorrect. Why build fragile code when you can build
robust code by changing the code in a trivial way?

OK that is enough on the particular issue.

Note that unless you are doing something advanced,(eg building low
level synchronization primitives for a multi-processor scenario),
spinning in a loop (even if there are SLEEPs), is generally a poor
solution. The Windows team is already banned such things within
Microsoft because they cause the processor to spin even when the
machine is idle from a user perspective. For Laptops, this is a issue
(even if you poll only once a second, if you have 100 apps running
doing this, you are consuming non-trivial power for no good reason).
You are also keeping memory pages hot that could be swapped out and
used for better purposes. You should be waiting on events.

Finally (I will end this e-mail eventually), When you play tricks to
get away from doing explicit thread synchronization, you are playing
with fire. It CAN be done, but only in special cases. The example below
only works because you never set 'stopRunning' to 'false' once it is
true (thus its value is 'monotonic' it only 'increases'). Moreover you
don't care that it gets set exactly once, or who 'wins' any races. This
is what allows you to get away without any interlocked operations (but
you still need volatile).

The vast majority of code does not need this kind of 'lock free'
performance. Don't do it unless you have the need (synchronized methods
are easy and much simpler to reason about). Getting concurrency right
in practice requires a diligence that is HARD. When you have bugs, they
are VERY hard to find. Keeping things as simple as possible from a
concurrency perspective is a really good idea.
 
Vance made an excellent point in his mail. Even so his answer indirectly
confirms that my conclusions were correct (see [1] below), but I agree with
his point that reliance on that type of code analyze makes code difficult to
support and therefore fragile.

I also have to make public apology to Jon for being intolerant and rude in a
couple of my responses to that thread.



- Valery.

P.S:

[1] If you read Vance's response you can note that he was talking about
Console.Write that could be made intrinsic and therefore eliminate
non-triviality of cycle's body. While as I was talking about delegate call
that will always guarantee that. After I send a short mail to Vance with
mentioning delegate call in cycle body he agreed that this indeed should
work on any CLI implementation that complies ECMA spec. However I totally
agree with his point that it is rather fragile assumption - if that call to
delegate will be deleted or commented some times later than volatile will be
required.



See my blog at:

http://www.harper.no/valery
 
Valery Pryamikov said:
Vance made an excellent point in his mail. Even so his answer indirectly
confirms that my conclusions were correct (see [1] below), but I agree with
his point that reliance on that type of code analyze makes code difficult to
support and therefore fragile.

I'm still not entirely convinced they were all correct even with the
delegate, although I *was* definitely wrong about it being able to
cache it in a register (without some truly weird smarts going on). If
it caches the value anywhere, it's got to make sure that the thread
uses that cache everywhere it deals with the value, in order for the
access within the thread itself to remain consistent. That's
technically possible (I believe), but of course highly, highly
improbable. As I said to Valery in an email, the kind of system which
might show that would be a distributed CLR which used an entire
computer's memory as cache, with a central networked backing store as
"main memory".

My conclusions:

1) On any architecture we're ever likely to see .NET on, the code would
have worked fine. In "bizarro world" with a CLR which stretches the
specification to its limits, it may or may not work - there may still
be some doubt both ways, depending on whether or not I've convinced you
:) Of course, in such a world you're likely to quickly come across a
whole load of other code which is also badly synchronized... no doubt
including a lot of mine!

2) The above doesn't mean it's a good way to write code, if only
because anything which takes two experts and an interested observer to
decide on whether or not it's correct is a really bad idea :)
I also have to make public apology to Jon for being intolerant and rude in a
couple of my responses to that thread.

Nah - just passionate. If it would make you feel better, I could find
dozens of posts where I've been flat out nasty! I think you probably
raised my adrenaline level, but not my blood pressure, which is always
a sign of healthy debate! Apologies in return for anything I said which
annoyed you. I'll look forward to our next debate. If you'd like to
claim that objects are passed by reference, I'm sure we could really
get going :)
 
If you'd like to
claim that objects are passed by reference, I'm sure we could really
get going :)

No kidding? <G> We ran around the block on that before :-) Cheers!
 
Back
Top