Problem
Ever wanted to read binary files written by a FORTRAN program with a C/C++ program? Not such an unusual or unreasonable request but FORTRAN does some strange things ..... consider the following FORTRAN code, where "a" is a 3D array of 4 byte floating point values.
open(60,file=filename,status='unknown',form='unformatted')
write(60) nx,ny,nz
do k = 1,nz
do j = 1,ny
write(60) (a(i,j,k),i=1,nx)
enddo
enddo
close(60)
What you will end up with is not a file that is (4 * nx) * ny * nz + 12 bytes long as it would be for the equivalent in most (if not all) other languages! Instead it will be nz * ny * (4 * nx + 8) + 20 bytes long. Why?
ReasonEach time the FORTRAN write is issued a "record" is written, the record consists of a 4 byte header, then the data, then a trailer that matches the header. The 4 byte header and trailer consist of the number of bytes that will be written in the data section. So the following
write(60) nx,ny,nz
gets written on the disk as follows where nx,ny,nz are each 4 bytes, the other
numbers below are 2 byte integers written in decimal
0 12 nx ny nz 0 12
The total length written is 20 bytes. Similarly, the line
write(60) (a(i,j,k),i=1,nx)
gets written as follows assuming nx is 1024 and "a" is real*4
0 4096 a(1,j,k) a(2,j,k) .... a(1024,j,k) 0 4096
The total length is 4104 bytes. Fortunately, once this is understood, it is a trivial to read the correct things in C/C++.
A consequence that is a bit shocking for many programmers is that the file created with the above code gives a file that is about 1/3 the size than one created with this code.
open(60,file=filename,status='unknown',form='unformatted')
write(60) nx,ny,nz
do k = 1,nz
do j = 1,ny
do i = 1,nx
write(60) a(i,j,k)
enddo
enddo
enddo
close(60)
In this case each element of a is written in one record and consumes 12 bytes for a total file size of nx * ny * nz * 12 + 20.
NoteThis doesn't affect FORTRAN programs that might read these files, that is because the FORTRAN "read" commands know how to handle these unformatted files.
The discussion here does not address the transfer of binary files between machines with a different endian. In that case after a short, int, float, double is read the bytes must be rearranged. Fortunately this is relatively straightforward with these macros.
#define SWAP_2(x) ( (((x) & 0xff) << 8) | ((unsigned short)(x) >> 8) )
#define SWAP_4(x) ( ((x) << 24) | (((x) << 8) & 0x00ff0000) | \
(((x) >> 8) & 0x0000ff00) | ((x) >> 24) )
#define FIX_SHORT(x) (*(unsigned short *)&(x) = SWAP_2(*(unsigned short *)&(x)))
#define FIX_LONG(x) (*(unsigned *)&(x) = SWAP_4(*(unsigned *)&(x)))
#define FIX_FLOAT(x) FIX_LONG(x)
It appears that the endianness of the 4 byte header and trailer reflect the endianness of the machine doing the writing. Of course if you know the format of the data being written then you can simply skip over the header/trailer bytes, but if you need to decode the file or do error checking then knowledge of the endian of the machine where the file was written and the endian of the machine where the file is being read is necessary.
And lastly, the above does not address the possibility (fairly rare these days) that the files may be transferred between two machines with different internal representations of floating point numbers. If that is the case then you're really in trouble and should probably revert to transferring the data in a readable ASCII format.
Update (Jan 2008): It would appear that on 64 bit machines the 2 header elements are each written as 4 bytes instead of 2 bytes each.
If the file is not already in existence then writing files in FORTRAN to avoid the above, one can use the access='stream' option. This option was introduced reasonably recently explicitly to overcome this issue.