MDL Syntax

Cornell University Program of Computer Graphics

There are two levels to the syntax of mdl files: The conceptual level and the file level. The conceptupal level is what the application program sees through the I/O library, and is the same for text and binary files; the file level is what's actually stored in the file, and is different for text and binary files.

At the conceptual level, a file consists of four types of data: 4-byte float, 4-byte int, 8-character keyword, and variable-length string. A file is a sequence of chunks. A chunk consists of a keyword and a sequence of data items, each of which can be a float, an int, a string, or a chunk.

At the file level, a text file contains the identifying keyword mdlFlA20 followed by a sequence of chunks. A chunk is a keyword, followed by a sequence of data items, followed by the keyword end. Floats and ints are represented in scanf/printf format, with the restriction that a float has to have a decimal point. Keywords are represented as sequences of up to eight alphanumeric characters, and strings are delimited by double quotes; all tokens must be separated by white space. Comments, which are for human consumption only, are delimited by # and end-of-line or by [ and ].

At the file level, a binary file contains the identifying keyword mdlflB20 followed by a sequence of chunks. A chunk is a keyword, followed by a word count, followed by a sequence of data items. The word count gives the number of words in the chunk, including all contained chunks, but not including the keyword and the count itself. Ints and floats are represented in their native form as 4-byte words. Keywords are the alphanumeric keyword padded with spaces to fill the eight bytes. Strings are null-terminated and padded at the end with nulls to fill out a 4-byte word.

Here is an example file, represented in the text format.


  [ A diffuse blue sphere ]
  sphr "racquetball" 
     lmbrtn rgb 0.2 0.2 0.8 end end
     0.0 0.0 0.0 0.03  % center, radius
Here is the same file in the binary format (underscores are spaces).
    2 words: mdlf, lB20
    2 words: sphr, ____
    1 word: 15
    3 words: racq, uetb, all\0
    2 words: lmbr, tn__
    1 word: 6
    2 words: rgb_, ____
    1 word: 3
    3 words: 0.2, 0.2, 0.8
    4 words: 0.0, 0.0, 0.0, 0.03
This nested structure with sizes at all levels allows programs to efficiently skip chunks that they don't recognize or are not interested in.

Since there is no way to tell the type of a piece of data (in a binary file) by looking at it, a program must be able to decide what to read next based on what came before. By convention, types depend only on the keyword of the chunk the data is in, so it would be immoral (though not illegal) to use a chunk where the value of the first item determines the type of the second. We represent the sequence of types that should appear in a chunk with a string of characters, with the following meanings:

  • i - an integer
  • f - a float
  • s - a string
  • C - a sub-chunk
  • (string)* - a sequence that can repeat zero or more times

For example, the type ``siifC'' means that the chunk always has 5 items in it: a string, two ints, a float, and a chunk. The type ``(f)*'' means that the chunk can contain any number of floats, including zero. Because the type of the next item must always be uniquely determined by its position in the chunk, there can be only one repeating section, and it has to be at the end of the chunk. The type ``s(i)*f'' is no good because we don't know whether the third item should be an int or a float, using only the chunk keyword as a hint.