Next: Miscellaneous Items Up: df: A Proposed Data Previous: Backus-Naur Form Contents

Future Enhancements

As of this writing, the df format is being used to store datasets in the Atmospheric Chemistry and Dynamics Branch at NASA Goddard Space Flight Center in Greenbelt, Maryland. A wide variety of datasets have been written in the format: vertical profile soundings, three-dimensional global grids of meteorological fields, global maps of total ozone, and data taken along aircraft flights. The format has been in use for about a year and has been used quite successfully.

In the near term, the important thing is to widen the scope of datasets written in the format, covering a larger variety of research groups and data structures. This should reveal any bugs or ambiguities in the format specification, and provide an opportunity to include any important missing features the authors have not thought of.

Future plans for the df format include:

Make provision for using special file record structures. Because the records containing metadata are written to a dataset along with the data, dealing with random or keyed access records in a file is difficult. On systems in which files can be addressed by byte offsets within a file, random access is no problem, but for other systems it can be troublesome. Three solutions suggest themselves: (a) treating the metadata records the same as data records (e.g., by padding to a fixed record length); (b) creating a new POINTDAT record which points to a separate file in which the data are kept; and (c) reminding the user that this format does not specify how data are physically stored, but merely how the data are to be presented to the readers--the data and metadata, then, could conceivably be split into two distinct files which (through a layer of software) would appear to the reader subroutines as a single dataset. Whichever method is eventually chosen, care must be taken to maintain the highest degree of independence on any particular operating system.
Be able to write records within an object in random order. Currently, the various records must be present in the dataset in a fixed and specified order. This will probably require creating a flag from one of the reserved words in the TEST record, as well as creating a new ``Length-of-next-record'' record to facilitate skipping around in a dataset.
Determine a relationship to other formats. The two most widely used standard formats currently are the Hierarchical Data Format (HDF) [NCSA Software Tools Group, 1989], created by The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, and netCDF [Rew, 1990], created by the Unidata Program Center of the University Corporation for Atmospheric Research (UCAR). The HDF format relies on centrally-defined tags which indicate data objects within a dataset; a df dataset could be assigned its own tag and be encapsulated in an HDF dataset as an HDF data object. The netCDF format actually specifies a software interface rather than a dataset format. The model it uses to manipulate data is that of a rectilinear lattice of data points, which is a subset of the data structures dealt with by the df format. Thus, a software interface can in principle be written to deal with df format datasets using the netCDF library calls.
Determine a representation for trees and arbitrary graph structures. The most likely method is to enter the data associated with nodes in the data records with a Level 1 or Level 2 dimension of ``index,'' with the connectivity information being records in a DESCSUP record for that dimension. Alternatively, an Auxgroup may be defined to specify connectivity information between data points. A third possibility is to create a new record type specifically to represent connections between data points.
Create standard I/O libraries.
Create a suite of standard software tools for inspection of df datasets.
Establish an appropriate procedure for assigning site IDs and registering other codes whose definitions are requested by various sites.

Next: Miscellaneous Items Up: df: A Proposed Data Previous: Backus-Naur Form Contents

Eric Nash 2003-09-25