Parallel structure factors calculation

As part of my new position in Oxford, I have a new computer: An 8-cores Intel Xeon CPU E5-2665 with 16GB of memory. Enough cores to try if my library is scaling well in multi-threaded environments.

The result is really good. It is not as perfect as it should theoretically be when I am using 8 threads or less but The hyper-threading manages to improve the speed a little more to a final result of 96% of the theoretical speed-up. However between 8 and 16 threads, the results are a bit unstable probably due to threads moving on different cores. This is clearly a consequence that no synchronization is needed between the threads. The overhead of the OpenMP is also minimum because of its use at a very high level in the code.

Threads wall clock time (ms) Refl.atoms.s^-1 Speedup
1 2300 1.4 10^8 1
2 1300 2.5 10^8 1.8
3 849 3.8 10^8 2.7
4 635 5.0 10^8 3.6
5 536 6.0 10^8 4.3
6 457 7.0 10^8 5.0
7 406 7.9 10^8 5.7
8 359 9.0 10^8 6.4
9 374 8.5 10^8 6.1
10 336 9.6 10^8 6.8
11 370 8.7 10^8 6.2
16 305 1.0 10^9 7.7

~150k reflections, 1000 atoms

Leave a Reply