Next: Miscellaneous Items
Up: df: A Proposed Data
Previous: Backus-Naur Form
  Contents
As of this writing, the df format is being used to store
datasets in the Atmospheric Chemistry and Dynamics Branch at
NASA Goddard Space Flight Center in Greenbelt, Maryland. A wide variety of
datasets have been written in the format:
vertical profile soundings, three-dimensional global
grids of meteorological fields, global maps of
total ozone, and data taken along aircraft flights. The
format has been in use for about a year and has been used
quite successfully.
In the near term, the important thing is to widen the scope of
datasets written in the format, covering a
larger variety of research groups and data structures.
This should reveal any bugs or ambiguities in the format
specification, and provide an opportunity to include any
important missing features the authors have not thought of.
Future plans for the df format include:
- Make provision for using special file record structures. Because the
records containing metadata are written to a
dataset along with the data, dealing with random or keyed
access records in a file is difficult. On systems in
which files can be addressed by byte offsets within a file,
random access is no problem, but for other systems it
can be troublesome. Three solutions suggest themselves: (a) treating the
metadata records the same as data records (e.g., by padding to
a fixed record length); (b) creating a new POINTDAT record which points to a
separate file in which the data are kept; and (c) reminding the user that this
format does not specify how data are physically stored, but
merely how the data are to be presented to the
readers--the data and
metadata, then, could conceivably be split into two distinct
files which (through a layer of software) would appear to the
reader subroutines as a single dataset.
Whichever method is eventually chosen, care must be taken to maintain the
highest degree of independence on any particular operating
system.
- Be able to write records within an object in random
order. Currently, the various records must be present
in the dataset in a fixed and specified
order. This will probably
require creating a flag from one of the reserved words in the
TEST record, as well as creating a new
``Length-of-next-record'' record to facilitate skipping around in
a dataset.
- Determine a relationship to other formats. The
two most widely used standard formats currently are the Hierarchical Data
Format (HDF) [NCSA Software Tools Group, 1989], created by The National Center for
Supercomputing Applications (NCSA) at the University of
Illinois at Urbana-Champaign, and netCDF [Rew, 1990], created by the Unidata
Program Center of the University Corporation for Atmospheric Research (UCAR).
The HDF format relies on
centrally-defined tags which indicate data
objects within a dataset;
a df dataset could be assigned its own tag and be
encapsulated in an HDF dataset as an HDF data
object. The netCDF
format actually specifies a software
interface rather than a dataset format. The
model it uses to manipulate data is that of a rectilinear lattice of data
points, which is a subset of the data
structures dealt with by the df format.
Thus, a software interface can in principle be written to deal
with df format datasets using the netCDF
library calls.
- Determine a representation for trees and arbitrary
graph structures. The most likely method is to enter the data associated with
nodes in the data records with a Level 1 or
Level 2 dimension of ``index,'' with the
connectivity information being records in a
DESCSUP record for that dimension.
Alternatively, an Auxgroup may be defined to specify
connectivity information between data points.
A third possibility is to create a new record type specifically to represent
connections between data points.
- Create standard I/O libraries.
- Create a suite of standard software tools for inspection
of df datasets.
- Establish an appropriate procedure for assigning site IDs
and registering other codes whose definitions are
requested by various sites.
Next: Miscellaneous Items
Up: df: A Proposed Data
Previous: Backus-Naur Form
  Contents
Eric Nash
2003-09-25