Next: Questions and Answers
Up: Pros and Cons
Previous: Pros and Cons
  Contents
As described in Chapter 1, using a standard
format has certain advantages and disadvantages.
The proposed format addresses the advantages
in these ways:
- Portability
- When written using XDR binary
representations, a df dataset is completely
portable across an electronic network.
When using a native binary format instead, the TEST record
at the beginning of the dataset tells just how the data were
written, which is the first and most important step in
converting it to be read on a
different machine. (In fact,
it is possible in principle to write generic programs to do
the conversion automatically.)
Furthermore, necessary
metadata, such as dimensional information
and identification of physical quantities and units, are
written to the datasets as coded numbers, making the
datasets portable across human
languages as well.
- Understandability
- All the
information needed to accompany the data, such as descriptions of its
dimensions, its ``family history,'' special
processing notes, and any sorts of general
comments, have a specified place within each
df dataset. Thus, the
metadata are straightforward to find and interpret.
- Reusability
- Because the df format
is flexible and adaptable to a wide variety of needs, one
avoids having to use dozens of different formats, all
radically different. And while the absence of a software
library of general df I/O subroutines
hinders reusability of software at present,
in the long term this difficulty should go away
as such libraries are developed.
The df format also minimizes the
disadvantages of standard formats:
- Inflexibility
- The df format is quite
flexible in being able to store many different kinds and
forms of physical science data in whatever order or numeric
representation the user desires, with processing
descriptions and flags specifiable over any arbitrary
subset of the data. While its most obvious
application is in storing multidimensional rectangular gridded
data fields, it can also store non-uniform grids, scattered point
data, and other data objects. With the use of
supplemental dimensional
information, one should be able to store even non-rectangular collections of
node-like data, such as trees or
directed graphs (the DESCSUP record could hold the
connectivity information).
- Overhead
- The df format provides the
user with the ability to choose how much overhead is involved
in reading the dataset. That is, aside from a few records at
the beginning of a data object, the amount of
storage space required for a data set can be determined by the
user, who can choose the numeric type used to store the data, as
well as whether any packing or compression
schemes are applied to the data. In addition, the user may choose which
binary number representation is to be used, whether the
portable XDR or the faster native binary formats.
- Complexity
- In this item, the df
format is clearly lacking. For a simple, straightforward data array, a
df dataset can be fairly simple. The full
format specification, however, is complicated and relies on
concepts (such as mapping of dimension indices
to data array indices) which many scientists will find hard to
follow. The situation could be improved by the development of a standard
library of subroutines to read and write the
format.
- Accessibility
- Instead of depending on a support group to develop all
the software needed to use the df format, the
user has the format specification itself and is thus free to implement it on
any platform and any language desired. While this does not
eliminate the need for a library of ready-to-use
software, it does free the user from dependence on such
software's existence.
- Conformance
- It is possible to write a
program which can scan a
dataset and ensure that it does indeed
conform to the standard.
Next: Questions and Answers
Up: Pros and Cons
Previous: Pros and Cons
  Contents
Eric Nash
2003-09-25