next up previous contents
Next: File Naming Conventions Up: Implementation Notes Previous: Standard library routines   Contents


Numeric Codes

Metadata in the df format occupies comparatively little space and is largely independent of language because much of it is represented by integer codes. While the format standard defines no mechanism for translating between such codes and their meanings, how the translation is implemented can have a significant impact on performance.

On the one extreme, users could look up the codes themselves from long lists; this, of course, would be inconvenient and error-prone. At the other extreme, code translations might be obtained using a network server, similar to hostname/IP address equivalences found using the Domain Name System.

This section describes a simple implementation of the lookup mechanism based on a set of lists maintained in text files. This mechanism is used to provide translation for sites, tasks, quantities, units, packing codes, dimensional supplemental information (as specified in DESCSUP records), processing codes, and data source codes. Each code type has a corresponding file, and each code integer value is key value in a lookup table of plain text code explanations.

All of these files should reside in a central location (directory). If the system allows, there should be a global environment variable or logical name that should be set to DFCODES. This will allow software to be ported between systems, and the files can be found using the globally defined symbolic name.

Two categories of codes exist: those defined centrally, which are the same for everybody, and those which are defined locally by each site (See Appendix A). For example, site identification codes must be defined centrally, so that everyone will know which site is which.

The centrally defined code lists are given standard names:

SITEIDS.TXT
contains the list of site ID codes. Each line consists of two text fields delimited by a colon: site-ID and full-site-name.
VARTYPES.TXT
contains the list of physical quantity types. Each line consists of four text fields delimited by colons: quantity-ID, four-letter-quantity-name, full-quantity-name, and preferred-units-abbreviation.
UNITS.TXT
contains the list of physical units. Each line consists of four text fields delimited by colons: units-code, full-units-name, units-abbreviation, and SI-units-definition.
HOWPACK.TXT
contains the list of packing code explanations. Each line consists of four text fields delimited by colons: packing-code, packing-description, packing-algorithm (with PAKVAL packing parameters indicated as $1, $2, etc.), and storage-spec (showing how the packing parameters are ordered in a PAKVAL record, $n$ indicating the number of parameter groups, $S$ denoting a short integer, $F$ denoting a floating-point number, and $I$ denoting a long integer).
SUPCODES.TXT
contains the list of dimensional supplemental codes. Each line consists of three test fields: supplementary-code, code-description, and parameter-list. The parameter-list is, in turn, composed of subfields delimited by semicolons, each subfield describing a supplemental value found in DESCSUP.

One makes modification to these files at one's own risk, since modifications will put the file out of step with all other sites.

The other, locally defined code lists are in the files:

Unnnnnnn.VAR
contains a list of locally defined quantity codes; the format of this file is the same as for VARTYPES.TXT.
Unnnnnnn.UNT
contains a list of locally defined units codes; the format of this file is the same as for UNITS.TXT.
Unnnnnnn.TSK
contains the task ID codes. Each line consists of two text fields delimited by a colon: task-code and task-description.
Unnnnnnn.PRC
contains the processing codes. Each line consists of three text fields separated by colons: processing-code, processing-title, and list-of-processing-variables. The last field is composed in turn of an arbitrary number of subfields delimited by semicolons, each subfield describing a single processing variable expected in a PROCVAL record associated with a given processing code.
Unnnnnnn.SRC
contains the data source codes. Each line consists of two text fields delimited by a colon: three-letter-source-abbreviation and full-text-name-of-source-institution.
where ``nnnnnnn'' is the local site ID in hexadecimal.

Note that these file names consist of eight or fewer uppercase alphanumeric characters followed by a period and three more uppercase alphanumeric characters, making the file names portable to a very wide variety of computer systems.

A code translator program, then, would open one of these files, find the line with the code or text for which it is searching, and retrieve the text or code which corresponds to it. Since such translations need to be done only for human readability, calls to code translators will probably not be prevalent in production programs, and the inefficiency implied by this translation method is tolerable. One can improve performance, if desired, by hard-wiring some of the most commonly used codes at a site into look-up tables in the translator routines. If a code is not found in those tables, then the routine will go to the files.

Note that when a file (or, more likely, a set of files) is imported from another site, one would also import that site's Unnnnnnn. files as well. Because the ``nnnnnnn'' part of their names will be different, the local U files and the remote site's U files can co-exist without conflict. A df dataset reader can obtain the originating site's ID from the dataset and use it to find the local-code definition file it needs to make the translation into human-readable terms. For example, the task identifier file for Site 1 will be named U0000001.TSK, and the processing code file for Site 13579BD will be U13579BD.PRC.


next up previous contents
Next: File Naming Conventions Up: Implementation Notes Previous: Standard library routines   Contents
Eric Nash 2003-09-25