As part of my new position in Oxford, I have a new computer: An 8-cores Intel Xeon CPU E5-2665 with 16GB of memory. Enough cores to try if my library is scaling well in multi-threaded environments.
The result is really good. It is not as perfect as it should theoretically be when I am using 8 threads or less but The hyper-threading manages to improve the speed a little more to a final result of 96% of the theoretical speed-up. However between 8 and 16 threads, the results are a bit unstable probably due to threads moving on different cores. This is clearly a consequence that no synchronization is needed between the threads. The overhead of the OpenMP is also minimum because of its use at a very high level in the code.
|Threads||wall clock time (ms)||Refl.atoms.s^-1||Speedup|
~150k reflections, 1000 atoms