Porting threads to windows. Critical sections are very slow
I'm porting some code to windows and found threading to be extremely slow.
The task takes 300 seconds on windows (with two xeon E5-2670 8 core 2.6ghz
= 16 core) and 3.5 seconds on linux (xeon E5-1607 4 core 3ghz). Using
vs2012 express.
I've got 32 threads all calling EnterCriticalSection(), popping an 80 byte
job of a std::stack, LeaveCriticalSection and doing some work (250k jobs
in total).
Before and after every critical section call I print the thread ID and
current time.
The wait time for a single thread's lock is ~160ms
To pop the job off the stack takes ~3ms
Calling leave takes ~3ms
The job takes ~1ms
(roughly same for Debug/Release, Debug takes a little longer. I'd love to
be able to properly profile the code :P)
Commenting out the job call makes the whole process take 2 seconds (still
more than linux).
I've tried both queryperformancecounter and timeGetTime, both give approx
the same result.
AFAIK the job never makes any sync calls, but I can't explain the slowdown
unless it does.
I have no idea why copying from a stack and calling pop takes so long.
Another very confusing thing is why a call to leave() takes so long.
Can anyone speculate on why it's running so slowly?
I wouldn't have thought the difference in processor would give a 100x
performance difference, but could it be at all related to dual CPUs?
(having to sync between separate CPUs than internal cores).
By the way, I'm aware of std::thread but want my library code to work with
pre C++11.
No comments:
Post a Comment