DWF formatFrom AutoDeskEdited by Paul Bourke The DWF format is intended only for the efficient viewing of drawings Similar to an electronic plot. It is not intended for the interchange of higher level data between applications, especially as most DWF data is post-tessellation. For example, a CAD application generates a .dwf file based on a drawing in the application's native format. This .dwf file is then transmitted and displayed by a much simpler viewing application, such as an Internet Web browser. Due to the lack of non-visual data, the DWF is not intended to be read back into the original CAD application, although the .dwf file can refer to another file in the application's native format with a DWF link or an embed operation. .dwf files are organized into three main sections as shown in figure 1.
Figure 1. DWF file organization The data in the header and the trailer are encoded as readable ASCII text. Data in the file data block is delimited by operation codes (opcodes) and argument data used by the opcodes (operands) as in table 1.
There are two types of opcode-operand pairs: readable ASCII text and coded binary. All DWF operations have a readable ASCII opcode/operand form, and most operations also have a coded binary opcode/operand form. By using the proper opcode form, you can create a file that is humanly readable or one that is more efficient from a processing and storage point of view or, more commonly, a mixture of both types. An application reading a .dwf file may not understand a set of opcodes, especially when the application reading the file outdates the application that created the file. For this reason, DWF is designed to allow a file reader to skip most opcodes. In order for the file reader to skip an opcode, it must know the length of its operand. DWF has three categories of opcodes: 1. Single-byte opcodes that must be recognized for efficiency reasons and thus cannot be skipped. A reader application need not implement these opcodes but must be able to compute their operand length, which requires that the opcode be recognized. If a single-byte opcode is unrecognized by a DWF reading application, the rest of the file cannot be read. 2. Extended ASCII opcodes (humanly readable) that have delimited and nestable operands. By following some simple rules, a reader application can safely skip such an opcode/operand pair without understanding the operation or its contents. 3. Extended binary opcodes that indicate their operand length so that a reader application can easily skip past the unknown operation and data. To improve the readability of .dwf files, an opcode may be preceded by white-space. White-space is defined as any number of ASCII spaces, tabs, carriage returns, or line feeds. This section describes each block of the .dwf file format in detail. The file header has two basic functions:
The header, shown in table 2, is 12 bytes that can be interpreted as a, possibly undetermined, ASCII string, for example, "(DWF V00.30)". The first six bytes are the constant "(DWF V." which identify this as a .dwf file. Note that these are in upper case This constant is followed by a 5-byte version number in the format shown in table 3.
The application generating a DWF format should specify the lowest possible version number that an application reading the file would need to support in order to properly use the data. Generally, a reader application should not attempt to read a file with a higher major revision value than what it was designed for. If the minor revision value is higher, this indicates that the file may contain opcodes unknown to the reader, which can and should be skipped. The file data block starts at the 13th byte of a .dwf file, and is a series of opcode and operand pairs, as in table 4.
An opcode may be preceded by any number of white-space characters, which are defined as any number of ASCII spaces, tabs, carriage returns, or line feeds. Opcodes are a single byte in length except for two special cases: extended ASCII and extended binary. This allows for over 200 operations. Note: Some values in the range from 0 to 255 of a byte are not legal for opcode use. These single-byte opcodes may have operands that are either readable ASCII or coded binary. Some common operations have separate opcodes for both ASCII and binary operand forms. Generally, if an operand is formatted as readable ASCII, then its single-byte opcode is also a readable ASCII character, which allows the file to be edited with a normal text editor. The following example shows a line drawing operation using the L single-byte opcode followed by a readable ASCII operand: L 500,20300 90100,48000 This line could also be represented using a binary-coded operand, as shown in the following example, which uses the l opcode. In this example, each underlined character represents a byte of binary operand data: lXXXXYYYYxxxxyyyy Except for the two special types of opcodes extended ASCII and extended binary, a file reader must know how to compute the operand length. The ASCII representations for the following cannot be used as opcodes:
In the case of an opcode with a binary coded operand, the binary data
is stored in The single-byte opcode, (, open parenthesis character, indicates an extended and possibly nested readable ASCII opcode. Following the open parenthesis character, (, is a multiple-byte, string token opcode followed by white space, followed by zero or more operands, followed by a close parenthesis terminator, ),: (Origin 240 120) Extended ASCII opcodes may be nested: (Owner (FirstName Brian)(LastName Mathews))
Extended ASCII opcodes may contain literal strings surrounded by the
single (Account (Person 'Brian Mathews ;-)') (Company 'Autodesk \'ADSK\'')) Note: Inside a quoted string, the, \ , character may be used to treat a subsequent, ' , or, \ , character as literal data. The single-byte opcode open curly brace character, { , indicates an extended and possibly nested binary section of data. Immediately following the open curly brace character, { , is a 4-byte integer that represents the length (in bytes) of the binary data. Following the length field is a 2-byte extended-binary opcode, which allows for over 65,000 operations. Finally, the binary stream is terminated with a closed curl brace character, }. For example, the binary data for a raster of pixels can be represented as {cccceexxxxxxxxx} where cccc is the length of the binary data, ee is the opcode for a raster, and xxxxxxxxx is the raster data. The extended binary opcode and the terminating character, } , are counted as part of the binary data stream. Thus, the value of cccc is 12 (nine x's, two e's, and one } ) in this example, which is encoded as a little-endian binary value. Skipping extended ASCII and extended binary opcodes. Skipping extended ASCII opcodes If a reader application does not recognize an extended ASCII opcode, it should keep scanning the file while matching open paren characters,(, with closed paren characters,) , until the terminating closed paren character, ) , is found. If a single quote character ,' , is found, scanning should continue until a matching single quote,' , is found, ignoring any open paren, ( , or closed paren, ) , characters inside. Note: While parentheses may be nested, single quote marks are not, as the latter always contain a single literal string. A backslash character, \ , indicates a literal character will follow that should not be used for literal string termination. Thus, the following would pass the operand This is\was a 'happy' face :-) comment! to the Comment opcode: (Comment 'This is\\was a \'happy\' face :-) comment!')
Skipping extended binary opcodes It is possible, although not recommended, for an extended ASCII opcode to contain nested extended binary data, as in (Embedded_DWG (FileName house.dwg) {ccccXXXXXXXXXX}) where cccc represents a 4-byte little-endian integer indicating the length of the binary data, 11 in this example represented by "XXXXXXXXXX" plus the terminating curly brace, "}". To skip any binary object, either opcode or operand data, the four byte count cccc must be used rather than searching for the curly brace character, }. Also, notice that this method allows a reader application to skip even a nested set of binary streams as the parent streams cccc count includes the subobjects data. If the four-byte binary data run count cccc has the value zero, this indicates that the DWF writing application was unable to compute the length of the binary data. Such an opcode can not be skipped, and therefore the reading application must either know how to parse the opcode, or must fail to read the remainder of the .dwf file. Obviously, DWF writing applications should refrain from this practice whenever possible. Most of the coordinates specified in .dwf files are in logical coordinates, as opposed to screen or device coordinates. Logical coordinates are specified as the positive range of 32-bit signed integers (31 bits of precision) with a legal range from 0 to a maximum of 2,147,483,647 (231 - 1). Normally, a DWF writing application should scale the geometric primitives of the illustration that is being stored so that a large portion of this 31-bit range is used. This allows a DWF reading application to scale the illustration for the desired display or a user to zoom in on the drawing with sufficient precision to render fine details. Integer Versus Floating-Point Values Thirty-two-bit integer values are used because they allow for more precision and greater computing speed than 32-bit floating-point values. Out of a floating-point numbers 32 bits, 8 bits are used to store exponent and sign information, leaving only 24 bits of true precision (not to be confused with a floating point number's large range). If a map were drawn with DWFs 31-bit integer coordinates, over 21,000 kilometers (>12,000 miles) of distance could be uniquely resolved down to 1-centimeter increments. If 32-bit floating-point coordinates were used, only 167 kilometers (100 miles) could be resolved to this level of detail. By contrast, AutoCAD uses 64 bit double-sized floating point coordinates to address this issue, with a resulting 52 bits of precision and an enormous range. For the more limited purpose of representing an electronic plot, DWF's 31 bits of precision are more than adequate. Depending upon the opcode in use, these logical coordinate values may be encoded in a .dwf file literally as absolute coordinates or as relative coordinates. Whereas absolute logical coordinates may only range from 0 to 2,147,483,647 (31-bit unsigned), relative coordinates may range from negative 2,147,483,647 to positive 2,147,483,647 (32-bit signed). A relative coordinate is formed from an absolute coordinate by taking the literal coordinate and subtracting from it the previous absolute coordinate in the file. Relative coordinates are used in order to increase the effectiveness of the DWF's data compression algorithm, which tries to find repeating patterns of data. Common drawings have objects, represented by sequences of lines, circles, and so forth, that may occur multiple times, such as the four tires in an illustration of a car. If absolute coordinates were used, each of the lines and circles that make up an object would have differing coordinates for each instance of that object in an illustration, due to their differing positions. If relative coordinates are used, however, only the first coordinate in the sequence of coordinates differs for each instance of the object. Since the remainder of the coordinate sequence is independent of the object instance, the data compression algorithm will find longer and more frequent sequences of repeating data. For many applications, the extreme level of detail allowed by DWFs 31-bit logical coordinates is not necessary and may be undesirable due to the increased file size needed to store such large values. For this reason, many drawing operations allow for 16-bit integer relative coordinates to be used 16-bit signed relative values. When a DWF reading application is given either a 32-bit or a 16-bit relative coordinate, its value is converted to a full 31-bit absolute logical coordinate before use. This is a lossless form of compression since the full 31 bits of precision are preserved even when storing only a 16 bit value. When assigning an opcode to an operation, apply the following principles:
Execution of the File Data Block To preserve proper drawing order, the opcodes found in the .dwf file should be executed in the order they are received. The DWF trailer is simply a special opcode indicating the end of the DWF data sequence file, normally at the end of the file. It is possible for an application to store non-DWF data following the .dwf file termination opcode.
Following is an example of a .dwf file that uses readable ASCII opcodes exclusively. This same example could be represented more efficiently using binary coding, but it is difficult to show this in a printed document. (DWF V01.00) (SourceFilename house plan.dwg) ) (Comment changing the color from
the default) (Layer 2 Heating) C12 v (Comment The following line
wont be visible.) (URL) (EndOfDWF)
Opcodes Listed by Format This section lists opcodes by single byte, extended ASCII, and extended binary formats. It is a convenient, quick reference to the standard opcode definitions in chapter 5.
|