Crystals is using packed storage to store symmetric matrices. Packed storage is using a linear array to store only the unique data (Netlib packed storage). There is a huge penalty on the calculation by doing so in some cases. In the 70s the space saved was precious but not anymore. At least in small molecule crystallography.
The function I am going to describe is to compute the Cholesky factorization of a real symmetric positive definite matrix A. Three versions exist:
- spotrf using normal storage
- spptrf using packed storage as in crystals
- spftrf also using packed storage but a different kind (Rectangular Full Packed (RFP)), more efficient.
The storage is not the only penalty, some functions might not be as optimised as others. Below is a benchmark on a 4000×4000 symmetric matrix using different libraries.
Lapack
[pascal@vinci build]$ gfortran a.f90 -lblas -llapack -o out
[pascal@vinci build]$ ./out
spotrf (normal): 13438 ms
spotrf (packed fast): 16549 ms
spptrf (packed slow): 21068 ms
openblas
[pascal@vinci build]$ gfortran a.f90 libopenblas.a -lpthread -o out
[pascal@vinci build]$ ./out
spotrf (normal): 495 ms
spotrf (packed fast): 543 ms
spptrf (packed slow): 21114 ms
mkl
[pascal@vinci build]$ ifort a.f90 -mkl -o out
[pascal@vinci build]$ ./out
spotrf (normal): 369 ms
spotrf (packed fast): 456 ms
spptrf (packed slow): 773 ms
acml (not multi-thread)
[pascal@vinci build]$ gfortran a.f90 libacml.a -o out
[pascal@vinci build]$ ./out
spotrf (normal): 1341 ms
spotrf (packed fast): 1323 ms
spptrf (packed slow): 21317 ms
Conclusion
The full matrix is always the fastest. Openblas and acml do not optimise one function hence the huge penalty. the 4000×4000 matrix is 64MB so not that big and probably not worth the trouble to use fancy storage.
Source code
program truc implicit none real, dimension(:,:), allocatable :: nmatrix, temp real, dimension(:), allocatable :: packed integer, parameter :: n=4000 integer info integer :: starttime integer, dimension(8) :: measuredtime allocate(nmatrix(n,n)) allocate(temp(n,n)) allocate(packed(n*(n+1)/2)) call random_number(temp) call sgemm ('N', 'T', n, n, n, 1.0, temp, n, temp, n, 0.0, nmatrix, n) temp=nmatrix call date_and_time(VALUES=measuredtime) starttime=((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8) call spotrf ('L', n, temp, n, INFO) call date_and_time(VALUES=measuredtime) print *, 'spotrf (normal): ', ((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8)-starttime, 'ms' print *, '' temp=nmatrix call strttf ('N', 'L', n, temp, n, packed, INFO) call date_and_time(VALUES=measuredtime) starttime=((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8) call spftrf ('N', 'L', n, packed, INFO) call date_and_time(VALUES=measuredtime) print *, 'spotrf (packed fast): ', ((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8)-starttime, 'ms' print *, '' temp=nmatrix call strttp ('L', n, temp, n, packed, INFO) call date_and_time(VALUES=measuredtime) starttime=((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8) call spptrf ('L', n, packed, INFO) call date_and_time(VALUES=measuredtime) print *, 'spptrf (packed slow): ', ((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8)-starttime, 'ms' print *, '' end program