Tuesday, July 31, 2007

volatile and threading

Until recently I hadn't much experience writing multi-threading programs in C++ so when I tried to I found that I'm really confused how multi-threading programs mix with volatile variables. So I did a little research and quick summary is: this topic is confusing. It looks like if you put locks around global variables shared between threads you shouldn't care about volatile flag. Definitely under POSIX threads and most likely when using other threading libraries as well. If you don't and rely on atomic operations it seems that you have to use volatile flag for shared global variables but concerning portability it is a grey area.

Longer story is below:

Suppose we have a piece of code which waits for a certain external condition to happen. The code could look like

bool gEvent = false;

void waitLoop() {
while (!gEvent) {
sleep(1);
}
...
}
Let's assume that this is a single threaded program and the external condition we are waiting for is a Unix signal. The signal handler is very simple - it simply sets gEvent to true:
void wakeUp() {
gEvent = true;
}
The problem with the code above is that compiler would optimize out check of the condition inside waitLoop() incorrectly assuming from local analysis of the code that gEvent never changes. The fix is to declare gEvent with volatile modifier which basically tells compiler that the variable can be changed at any time and that is unsafe to perform any optimization based on the analysis of local code:
volatile bool gEvent = false;
Let's take another example. The code is same but this time it is a mutli-threaded program where one thread waits for another. So waitLoop() runs inside one thread and wakeUp() eventually called from another. Is the code still correct? Probably yes if we keep volatile flag and if operations which read or write gEvent variable can be considered as atomic. The later assumptions seems to be correct for most (all?) platforms.

But what if we cannot treat operations which read or write gEvent variable as atomic? For example it might be an instance of a more complex type; for example an instance of class which contains other information then just a information whenever event have happened or not:
struct EventInfo {
EventInfo(bool happened = false, const string& source = "")
: fHappened(happened), fSource(source)
{}
bool fHappened;
string fSource;
}

volatile EventInfo gEventInfo;

void waitLoop() {
while (!fEventInfo.fHappened) {
sleep(1);
}
const string& eventSource = fEventInfo.fSource;
...
}

void wakeUp() {
gEventInfo = EventInfo(true, "wakeUp");
}
This code is still ok for single-threaded program where wakeUp() is a signal handler but is unsafe for multi-threaded program where wakeUp() runs in a separate thread as operations on gEventInfo cannot be treated as atomic anymore.

So how do we fix it? We should surround places where code reads or writes gEventInfo with locks to make sure only one thread accesses gEventInfo at a time. I'll use boost thread library in the example.
boost::mutex gMutex;

void waitLoop() {
string eventSource;

for (bool eventHappened = false; !eventHappened; ) {
{
boost::mutex::scoped_lock lock(gMutex);
eventHappened = fEventInfo.fHappened;
eventSource = fEventInfo.fSource;
}
sleep(1);
}
...
}

void wakeUp() {
boost::mutex::scoped_lock lock(gMutex);

gEventInfo = EventInfo(true, "wakeUp");
}
Comparing this code with earlier examples it looks like we still need to declare gEventInfo variable as volatile but it turns out we don't really need to. Quote from Thread Cannot be Implemented as a Library [PDF]:
In practice, C and C++ implementations that support
Pthreads generally proceed as follows:
  1. Functions such as pthread_mutex_lock() that are guaranteed by the standard to “synchronize memory” include hardware instructions (“memory barriers”) that prevent hardware reordering of memory operations around the call.
  2. To prevent the compiler from moving memory operations around calls to functions such as pthread_mutex_lock(), they are essentially treated as calls to opaque functions, about which the compiler has no information. The compiler effectively assumes that pthread_mutex_lock() may read or write any global variable. Thus a memory reference cannot simply be moved across the call. This approach also ensures that transitive calls, e.g. a call to a function f() which then calls pthread_mutex_lock(), are handled in the same way more or less appropriately, i.e. memory operations are not moved across the call to f() either, whether or not the entire user program is being analyzed at once.
So at least if you using POSIX threads (boost::threads under Linux uses them) your code is probably safe without use of volatile as long as you use locks around global variables shared between several threads. Good question whenever this example code is portable to other platforms; after all boost::threads supports threading libraries other then POSIX which may have other rules for mutexes and locks. I haven't researched this yet as for now I don't really care about other platforms.

Some interesting links on this topic:
  • A Memory model for C++: FAQ - mentions shortly reasons why volatile keyword is insufficient to ensure synchronization between threads and has links on papers for further reading.
  • http://www.artima.com/cppsource/threads_meeting.html - Not much to read there but I love this quote: "Not all the dragons were so easily defeated, unfortunately. Among the issues guaranteed to waste at least 20 minutes of group time with little or nothing to show ... What does volatile mean?" (this in context of multi-threaded programs). If C++ experts cannot agree on this ...
  • Another person gets confused over use of volatile and threads. Interesting discussion on comp.programming.threads.

7 comments:

austin said...

I picked up the note on volatile in this instance long ago when I first read "Programming Applications for Microsoft Windows" by J. Richter.
Obviously synchronizing like that is an incredibly bad practice anyway (I did it a time or two myself before I read Richter's book) and it should be somewhat self evident, so you're asking to get bitten in the ass one way or another if you trust such a route. :) And locks are ugly in themselves although they solve the problem to their own extent (although you still may get an ass-biting if you're not careful.)

Ilya Martynov said...

austin, could you elaborate "Obviously synchronizing like that is an incredibly bad practice anyway"? The code in my blog post is made up but it does resemble some simple examples of threaded programs one can find elsewhere on web so I'm curious what exactly you think is a "bad practice"?

Edd said...

You should really be using boost::condition for this kind of thing.

Your code as it stands looks like it will deadlock (assuming your scoped_locks are supposed to be constructed with gMutex as an argument).

Using volatile (only) for synchronisation is a bad idea because even though it stops the compiler re-ordering operations, it does not stop the processor from doing so; the barrier operations injected by the pthreads calls really are needed.

Ilya Martynov said...

assuming your scoped_locks are supposed to be constructed with gMutex as an argument

Yep, I've just corrected this. Thanks.

Your code as it stands looks like it will deadlock.

Right, I didn't realize this. I've fixed this too.

You should really be using boost::condition for this kind of thing.

Agree. After I fixed the deadlock the code looks less elegant then it would if I were using boost::condition.

Using volatile (only) for synchronisation is a bad

This is what some of papers on this topic say. Yet, there are examples online which show basically the same code as in my blog post. See for example
http://www.ddj.com/cpp/184403766
. It is written by C++ guru Andrei Alexandrescu who I'd suppose knows what he is talking about. Anyway I agree unless you know what you are doing (in this case I don't) it is safer to just use standard thread synchronization primitives like mutexes.

austin said...

@ilya: Global state with a primitive ad-hoc sync mechanism like this is not a good idea, even if you can avoid the compiler-generated-deadlock using volatile, I'm still not even sure if that gives you any sort of guarantees at a lower level (as the instruction pipeline will do out-of-order execution as it sees fit.) I suppose calling it 'bad practice' is simply my opinion.

Along with this, using a mutex to lock has the advantage that your system will put the thread to sleep rather than idly wasting CPU cycles and taking up time from the scheduler, and it will awake it appropriately when the kernel object related to the mutex is 'open.'

I've seen code like it too (like I said, I did it too a long time ago,) and I will stand on my own if necessary when I say it is an incredibly poor method to acheive what is so graciously provided as a system construct.

Also, I forgot to mention in my first comment: good post. :)

Edd said...

You may also find this video of Hans Boehm interesting:

http://irbseminars.intel-research.net/

(first one on the page).

He discusses the ins and outs of designing the C++ memory model for the next revision of the standard and illustrates some of the scarier optimisations that are valid in the single threaded world but that break multi-threaded code. The volatile issue is also touched upon briefly, IIRC.

I too, should have said: good post :)

Edd

Andrew said...

@Ilya: With all my respect to Andrei I should say that article which you cited above http://www.ddj.com/cpp/184403766 is somewhat out of touch with reality.

Good post!