This document briefly describes the byte swapping required when a binary file created on a DOS/WIndows is to be read on a computer which has its bytes ordered the other way.
There are various datatypes which may be read, the simplest is characters where no byte swapping is required. The next simplest is an unsigned short integer represented by 2 bytes. If the two bytes are read sequentially then the integer value on a big endian machine is 256*byte1+byte2. If the integer was written with a little endian machine such as a DOS/WINDOWS computer then the integer is 256*byte2+byte1.
While this approach can be used for unsigned shorts, ints, and longs and can be easily modified for signed versions of the same, it is rather difficult for real numbers (floats and double precision numbers). Fortunately the standard IEEE numerical format is used almost exclusively now days so that the bytes making up the particular number can be swapped around appropriately in memory. This does assume that the size of the particular numerical type is the same length on both machines, the machine that wrote the file and the machine reading the file. The usual standards are short integers are 2 bytes, long integers are 4 bytes, floats are 4 bytes and doubles are 8 bytes.
In summary, to read 2 byte integers (signed or unsigned) one reads the 2 bytes as normal, eg: using fread(), and then swap the 2 bytes in memory. It turns out that for long integers, floats and doubles the requirements is to reverse the bytes as they appear in memory. See the source below for more details.
Source codeSome routines illustrating the methods required to do the byte swapping for various numerical types.
/*
Read a short integer, swapping the bytes
*/
int ReadShortInt(FILE *fptr,short int n)
{
unsigned char *cptr,tmp;
if (fread(n,2,1,fptr) != 1)
return(FALSE);
cptr = (unsigned char *)n;
tmp = cptr[0];
cptr[0] = cptr[1];
cptr[1] =tmp;
return(TRUE);
}
/*
Read an integer, swapping the bytes
*/
int ReadInt(FILE *fptr,int *n)
{
unsigned char *cptr,tmp;
if (fread(n,4,1,fptr) != 1)
return(FALSE);
cptr = (unsigned char *)n;
tmp = cptr[0];
cptr[0] = cptr[3];
cptr[3] = tmp;
tmp = cptr[1];
cptr[1] = cptr[2];
cptr[2] = tmp;
return(TRUE);
}
/*
Read a floating point number
Assume IEEE format
*/
int ReadFloat(FILE *fptr,float *n)
{
unsigned char *cptr,tmp;
if (fread(n,4,1,fptr) != 1)
return(FALSE);
cptr = (unsigned char *)n;
tmp = cptr[0];
cptr[0] = cptr[3];
cptr[3] =tmp;
tmp = cptr[1];
cptr[1] = cptr[2];
cptr[2] = tmp;
return(TRUE);
}
/*
Read a double precision number
Assume IEEE
*/
int ReadDouble(FILE *fptr,double *n)
{
unsigned char *cptr,tmp;
if (fread(n,8,1,fptr) != 1)
return(FALSE);
cptr = (unsigned char *)n;
tmp = cptr[0];
cptr[0] = cptr[7];
cptr[7] = tmp;
tmp = cptr[1];
cptr[1] = cptr[6];
cptr[6] = tmp;
tmp = cptr[2];
cptr[2] = cptr[5];
cptr[5] =tmp;
tmp = cptr[3];
cptr[3] = cptr[4];
cptr[4] = tmp;
return(TRUE);
}
MacrosAn alternative for all but doubles is to use these cute macros, then the swapping is done inline.
#define SWAP_2(x) ( (((x) & 0xff) << 8) | ((unsigned short)(x) >> 8) )
#define SWAP_4(x) ( ((x) << 24) | \
(((x) << 8) & 0x00ff0000) | \
(((x) >> 8) & 0x0000ff00) | \
((x) >> 24) )
#define FIX_SHORT(x) (*(unsigned short *)&(x) = SWAP_2(*(unsigned short *)&(x)))
#define FIX_INT(x) (*(unsigned int *)&(x) = SWAP_4(*(unsigned int *)&(x)))
#define FIX_FLOAT(x) FIX_INT(x)
Strategies for developers
There are three basic strategies for software developers when choosing how to create endian independent data files and associated software.
Decide that the file format will be one particular endian. In this case software running on machines of the same endian does nothing special, software running on other machines byte swap everything on reading and writing. This is common for file formats and software designed with an implicit endian assumption which get ported at a future date to other machines.
Store in the file the endian-ness of the file. The software writes the binary file in the natural endian of the underlying hardware but pays attention to the endian-ness when reading binary files. Both endian files need to be handled, the software has knowledge of its own endian-ness so it can do the right thing.
The poorer cousin of the last approaches is not to store the endian-ness and for software to always write in its natural endian. This leads to two possible file types and the user is expected to know which endian a file is and chooses the appropriate one when specifying which file to read. This is obviously the least attractive approach.