Audit Trail

Next: Processing Codes Up: Elucidation Previous: Auxiliary Information Contents

Audit Trail

The audit trail concept is a very important one in trying to trace the history of a dataset and can be especially useful when one is faced with an unknown dataset. It allows a user to discover the family history of the data, finding out what datasets were used in creating the present data, from which sites these ancestor datasets originated, and which tasks at those sites produced them.

Such a family tree is passed down from dataset to dataset, generation to generation, by extracting it from component datasets, joining it together somehow, and putting the result into a new dataset. Because data from any dataset may be used to create some other dataset (e.g., to initialize a model), such an audit trail must be included in every dataset if it is to be useful. Care must be taken, therefore, when designing such a data structure to keep it compact, including only the information which is absolutely necessary. Including all relevant information (file names, program names and version numbers, run-time parameters) would cost more in increased file sizes than would be gained in usefulness. Therefore, the df audit trail is made up of nodes, one per ancestor dataset, which are composed of four long integers: a site ID, a task ID, a date stamp, and a pointer to another audit node. This information is limited, but it does allow a user to contact the sites for further information beyond what is given in the dataset's COMMENT, INFOSPEC, and PROCSPEC records.

The AUDIT record type keeps track of this audit trail with its TREE field, which contains a group of nodes connected in a tree structure. The site identifier code in a node is unique to the site where the df format dataset is generated. (These codes are discussed in Section A.1.2). The task identifier is a site-dependent code indicating the task which generated the data. (These codes are explained in Section A.2.1). The date stamp is the the day number from 1 January 1900 (i.e., the number of days elapsed since 31 December 1899). The pointer points to the next node in the tree (with the the current data set's pointer being NULL. The tree is set up as a one-dimensional array of nodes. Each pointer contains a displacement from itself in the array: for example, a pointer to the next node would have a value of 1, a pointer the the third node down the line would have a value of 3, and so on.

See Section 3.4 for examples.

Next: Processing Codes Up: Elucidation Previous: Auxiliary Information Contents

Eric Nash 2003-09-25