Understandability

Next: Reusability Up: Towards a Standard Format Previous: Portability Contents

Understandability

Secondly, a standard format must in some sense be understandable, or self-describing. To begin with, a file should always identify itself as conforming to the standard--the user must somehow always be able to test simply whether a given file was written in the format. One should also be able to determine the binary representation used in the file, whether it was written in a given machine's native binary data representation or whether it was written in a portable representation such as XDR. For example, if a site using a Unix workstation obtains a dataset from an IBM mainframe, the Unix site should be able to determine if the IBM file is in the standard format, as well as what binary data representation was used (the IBM site might have obtained the dataset from a Cray site, after all).

A standard format should also be self-documenting in the more usual sense: metadata (data about the data) should be contained within the file. This documentation must include some form of identification of the data in the file as well as its units, dimensional structure, and any special processing notes and/or comments of which users of the data should take note. (Some of these notes will apply only to a subset of the data--provision must be made for specifying such subsets.) Additionally, bad or missing data points in a regular field must somehow be flagged.

In addition, it would be most useful to have an audit trail mechanism to maintain a sort of ``family tree'' detailing the lineage of a dataset which has been derived from other datasets. Thus, for example, the output from model runs whose initializations were obtained from various data files can be identified clearly.

Next: Reusability Up: Towards a Standard Format Previous: Portability Contents

Eric Nash 2003-09-25