Fortran weakness: byte wise I/O

In the Crm² lab, in Nancy (France), we have prototype of an hybrid pixel detector from the company Imxpad. The data from the detector are raw ASCII files that I need to convert into files suitable for integration software. We are working at the moment with Crysalis and the Eval suite.

The esperanto format used in Crysalis is a header with the frame written in binary as signed 32bits integer which translates into Fortran:

 open(newunit=filehandled, form="unformatted", access="stream", file='filename')
 write(filehandled) esperanto
 close(filehandled)

This is the most efficient way to write an array in Fortran. The overhead on the write statement is minimum.

The cbf format used in Eval is a bit more complicated as Eval asks for byte offset compression on the data (signed integer). The result is that data have to be written byte by byte.

open(newunit=filehandled, form="unformatted", access="stream", file='filename')
basepixel=0
byteswritten=0
! origin is at the bottom
do j = ubound(cbf,2), 1, -1
    do k = 1, ubound(cbf,1)
        deltai=cbf(k,j)-basepixel
        if(deltai>=-127 .and. deltai<=127) then
            write(scratchfile) int(deltai,1)
            byteswritten=byteswritten+1
            basepixel=cbf(k,j)
        else if(deltai>=-32767 .and. deltai<=32767) then
            write(scratchfile) char(128)  ! 0x80 
            write(scratchfile) int(deltai,2)
            byteswritten=byteswritten+3
            basepixel=cbf(k,j)
        else if(deltai>=-2147483647 .and. deltai<=2147483647) then
            ! 0x8000 split in 2 character and manually switch for little_endian
            write(scratchfile) char(128)//char(0)//char(128) ! 0x80 and 0x8000 little_endian
            write(scratchfile) int(deltai,4)
            byteswritten=byteswritten+7
            basepixel=cbf(k,j)
        else
            ! 0x80, 0x8000 and 0x80000000 little_endian
            write(scratchfile) char(128)//char(0)//char(128)//char(0)//char(0)//char(0)//char(128)
            write(scratchfile) deltai
            byteswritten=byteswritten+15
            basepixel=cbf(k,j)
        end if
    end do
end do

Comparing the two versions, the conversion to esperanto takes 78ms and the cbf version 130ms. When the number of threads are increased using OpenMP (several images at the same time), the cbf version does not scale at all (I guess with 4 different processes instead, it would scale) while the esperanto is ok. On 4 threads, the conversion to esperanto takes 23ms per image and the cbf 110ms. There is lot of information on the Internet about the slowness of Fortran on stream I/O. The testing was done on an Intel quad core Q9505 and the hard disk was a solid state Samsung SSD 840 PRO Series.

It is not possible to overcome the overhead on the write statement and I did not want to write a C function instead so I used a buffer as a character string to hold hold data before writing them to a file. There is no byte or binary type in Fortran. Which means I need to convert signed integer of different size into characters. This is done via bit wise shift and ‘and’ operator

The previous loop has then been written into:

basepixel=0
byteswritten=0
totalbyteswritten=0
! origin is at the bottom
do j = ubound(cbf,2), 1, -1
    do k = 1, ubound(cbf,1)
        if(byteswritten&gt;buffer_length-15) then
            write(scratchfile) longbuffer(1:byteswritten)
            totalbyteswritten=totalbyteswritten+byteswritten
            !print *, totalbyteswritten, byteswritten
            byteswritten=0
        end if
 
        deltai=cbf(k,j)-basepixel
        if(deltai>=-127 .and. deltai<=127) then
            longbuffer(byteswritten+1:byteswritten+1) = char(iand(deltai, z'FF'))
            byteswritten=byteswritten+1
            basepixel=cbf(k,j)
        else if(deltai>=-32767 .and. deltai<=32767) then
            longbuffer(byteswritten+1:byteswritten+1) = char(128) ! 0x80
            longbuffer(byteswritten+2:byteswritten+2) = char(iand(deltai, z'FF'))
            longbuffer(byteswritten+3:byteswritten+3) = char(iand(ishft(deltai, -8), z'FF'))
            byteswritten=byteswritten+3
            basepixel=cbf(k,j)
        else if(deltai>=-2147483647 .and. deltai<=2147483647) then
            ! 0x8000 split in 2 character and manually switch for little_endian
            longbuffer(byteswritten+1:byteswritten+3) = char(128)//char(0)//char(128) ! 0x80 and 0x8000 little_endian
            longbuffer(byteswritten+4:byteswritten+4) = char(iand(deltai, z'FF'))
            longbuffer(byteswritten+5:byteswritten+5) = char(iand(ishft(deltai, -8), z'FF'))
            longbuffer(byteswritten+6:byteswritten+6) = char(iand(ishft(deltai, -16), z'FF'))
            longbuffer(byteswritten+7:byteswritten+7) = char(iand(ishft(deltai, -24), z'FF'))
            byteswritten=byteswritten+7
            basepixel=cbf(k,j)
        else
            ! 0x80, 0x8000 and 0x80000000 little_endian
            longbuffer(byteswritten+1:byteswritten+7) = char(128)//char(0)//&
            &   char(128)//char(0)//char(0)//char(0)//char(128)
            longbuffer(byteswritten+8:byteswritten+8) = char(iand(deltai, z'FF'))
            longbuffer(byteswritten+9:byteswritten+9) = char(iand(ishft(deltai, -8), z'FF'))
            longbuffer(byteswritten+10:byteswritten+10) = char(iand(ishft(deltai, -16), z'FF'))
            longbuffer(byteswritten+11:byteswritten+11) = char(iand(ishft(deltai, -24), z'FF'))
            longbuffer(byteswritten+12:byteswritten+12) = char(iand(ishft(deltai, -32), z'FF'))
            longbuffer(byteswritten+13:byteswritten+13) = char(iand(ishft(deltai, -40), z'FF'))
            longbuffer(byteswritten+14:byteswritten+14) = char(iand(ishft(deltai, -48), z'FF'))
            longbuffer(byteswritten+15:byteswritten+15) = char(iand(ishft(deltai, -56), z'FF'))
            byteswritten=byteswritten+15
            basepixel=cbf(k,j)
        end if
    end do
end do
if(byteswritten>0) then
    write(scratchfile) longbuffer(1:byteswritten)
    totalbyteswritten=totalbyteswritten+byteswritten
    byteswritten=0
end if

The buffered version (longbuffer is 16384bytes) is much faster and as fast as the esperanto conversion. It now takes 72ms on one thread and 22ms on 4 threads which is 5 times faster than before. The remaining bottleneck is the reading of the ASCII data but nothing can be done apart using binary storage from the detector and not ASCII.

There is a C library for manipulating cbf files: CBFlib (Same on Sourceforge) I did not used it because I do not like to link to external libraries not available in official repositories of common linux distribution. Especially when just a few lines of codes are necessary to have my own function.

Leave a Reply