Lapack and packed storage

Crystals is using packed storage to store symmetric matrices. Packed storage is using a linear array to store only the unique data (Netlib packed storage). There is a huge penalty on the calculation by doing so in some cases. In the 70s the space saved was precious but not anymore. At least in small molecule crystallography.
The function I am going to describe is to compute the Cholesky factorization of a real symmetric positive definite matrix A. Three versions exist:

  1. spotrf using normal storage
  2. spptrf using packed storage as in crystals
  3. spftrf also using packed storage but a different kind (Rectangular Full Packed (RFP)), more efficient.

The storage is not the only penalty, some functions might not be as optimised as others. Below is a benchmark on a 4000×4000 symmetric matrix using different libraries.

Lapack

[pascal@vinci build]$ gfortran a.f90 -lblas -llapack -o out
[pascal@vinci build]$ ./out
spotrf (normal): 13438 ms
spotrf (packed fast): 16549 ms
spptrf (packed slow): 21068 ms

openblas

[pascal@vinci build]$ gfortran a.f90 libopenblas.a -lpthread -o out
[pascal@vinci build]$ ./out
spotrf (normal): 495 ms
spotrf (packed fast): 543 ms
spptrf (packed slow): 21114 ms

mkl

[pascal@vinci build]$ ifort a.f90 -mkl -o out
[pascal@vinci build]$ ./out
spotrf (normal): 369 ms
spotrf (packed fast): 456 ms
spptrf (packed slow): 773 ms

acml (not multi-thread)

[pascal@vinci build]$ gfortran a.f90 libacml.a -o out
[pascal@vinci build]$ ./out
spotrf (normal): 1341 ms
spotrf (packed fast): 1323 ms
spptrf (packed slow): 21317 ms

Conclusion

The full matrix is always the fastest. Openblas and acml do not optimise one function hence the huge penalty. the 4000×4000 matrix is 64MB so not that big and probably not worth the trouble to use fancy storage.

Source code

program truc
implicit none
 
real, dimension(:,:), allocatable :: nmatrix, temp
real, dimension(:), allocatable :: packed
integer, parameter :: n=4000
integer info
integer :: starttime
integer, dimension(8) :: measuredtime
 
allocate(nmatrix(n,n))
allocate(temp(n,n))
allocate(packed(n*(n+1)/2))
 
call random_number(temp)
call sgemm ('N', 'T', n, n, n, 1.0, temp, n, temp, n, 0.0, nmatrix, n)
 
temp=nmatrix
call date_and_time(VALUES=measuredtime)
starttime=((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8)
call spotrf ('L', n, temp, n, INFO)
call date_and_time(VALUES=measuredtime)
print *, 'spotrf (normal): ', ((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8)-starttime, 'ms'
print *, ''
 
temp=nmatrix
call strttf ('N', 'L', n, temp, n, packed, INFO)
call date_and_time(VALUES=measuredtime)
starttime=((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8)
call spftrf ('N', 'L', n, packed, INFO)
call date_and_time(VALUES=measuredtime)
print *, 'spotrf (packed fast): ', ((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8)-starttime, 'ms'
print *, ''
 
temp=nmatrix
call strttp ('L', n, temp, n, packed, INFO)
call date_and_time(VALUES=measuredtime)
starttime=((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8)
call spptrf ('L', n, packed, INFO)
call date_and_time(VALUES=measuredtime)
print *, 'spptrf (packed slow): ', ((measuredtime(5)*3600+measuredtime(6)*60)+measuredtime(7))*1000+measuredtime(8)-starttime, 'ms'
print *, ''
 
end program

Leave a Reply