Next: Questions and Answers Up: Pros and Cons Previous: Pros and Cons Contents

How Well Did We Meet Our Design Goals?

As described in Chapter 1, using a standard format has certain advantages and disadvantages.

The proposed format addresses the advantages in these ways:

Portability: When written using XDR binary representations, a df dataset is completely portable across an electronic network. When using a native binary format instead, the TEST record at the beginning of the dataset tells just how the data were written, which is the first and most important step in converting it to be read on a different machine. (In fact, it is possible in principle to write generic programs to do the conversion automatically.) Furthermore, necessary metadata, such as dimensional information and identification of physical quantities and units, are written to the datasets as coded numbers, making the datasets portable across human languages as well.
Understandability: All the information needed to accompany the data, such as descriptions of its dimensions, its ``family history,'' special processing notes, and any sorts of general comments, have a specified place within each df dataset. Thus, the metadata are straightforward to find and interpret.
Reusability: Because the df format is flexible and adaptable to a wide variety of needs, one avoids having to use dozens of different formats, all radically different. And while the absence of a software library of general df I/O subroutines hinders reusability of software at present, in the long term this difficulty should go away as such libraries are developed.

The df format also minimizes the disadvantages of standard formats:

Inflexibility: The df format is quite flexible in being able to store many different kinds and forms of physical science data in whatever order or numeric representation the user desires, with processing descriptions and flags specifiable over any arbitrary subset of the data. While its most obvious application is in storing multidimensional rectangular gridded data fields, it can also store non-uniform grids, scattered point data, and other data objects. With the use of supplemental dimensional information, one should be able to store even non-rectangular collections of node-like data, such as trees or directed graphs (the DESCSUP record could hold the connectivity information).
Overhead: The df format provides the user with the ability to choose how much overhead is involved in reading the dataset. That is, aside from a few records at the beginning of a data object, the amount of storage space required for a data set can be determined by the user, who can choose the numeric type used to store the data, as well as whether any packing or compression schemes are applied to the data. In addition, the user may choose which binary number representation is to be used, whether the portable XDR or the faster native binary formats.
Complexity: In this item, the df format is clearly lacking. For a simple, straightforward data array, a df dataset can be fairly simple. The full format specification, however, is complicated and relies on concepts (such as mapping of dimension indices to data array indices) which many scientists will find hard to follow. The situation could be improved by the development of a standard library of subroutines to read and write the format.
Accessibility: Instead of depending on a support group to develop all the software needed to use the df format, the user has the format specification itself and is thus free to implement it on any platform and any language desired. While this does not eliminate the need for a library of ready-to-use software, it does free the user from dependence on such software's existence.
Conformance: It is possible to write a program which can scan a dataset and ensure that it does indeed conform to the standard.

Next: Questions and Answers Up: Pros and Cons Previous: Pros and Cons Contents

Eric Nash 2003-09-25