next up previous contents
Next: Numeric Codes Up: Implementation Notes Previous: Implementation Notes   Contents


Standard library routines

Compared with certain other proposed standard formats, one omission stands out with df: no attempt is made here to specify a standard library of I/O subroutines.

The fact is that a tradeoff exists between the flexibility needed to represent one's data as one desires, and the ability of a standard program to read such a dataset. Take, for example, the issue of bad-data flags: using a single bad-data flag value to signify bad or missing data over an entire dataset is relatively simple to include in a dataset reader subprogram; implementing a general reader able to cope with bad-data flag values which vary from region to region in the data (as specified by START and END indices) can be nightmarish. Let the reader be assured that such pathological cases are by no means exceptional or rare.

The temptation to develop a format standard in parallel with its software library is strong but should be resisted. It is all too easy, when confronted with the challenge of writing subroutines to handle the wide variety a good format allows, to rein in and limit that variety to make the programming task easier. Thus, for example, data arrays might be required to be written in row-major order, or their indices required to start numbering from 1.

In addition, the programming world is in a state of flux at present; object-oriented concepts are making inroads on traditional programming paradigms. Imposing a standard software library at this premature stage, allowing the format to be dictated by programming considerations, could end up being an example of imposed obsolescence.

The authors have decided, then, to concentrate on the format itself, making it as complete, flexible, and unambiguous as possible. The df format is exactly ``A Standard Format for Programmers to grovel in bits'' (in contradiction to a slogan proudly proclaimed by the creators of another standard format). As experience with df datasets accumulates, a software library can evolve and be built up to meet users' needs.

In designing this format, flexibility was chosen over convenient uniformity. The task of programming for the general case is thus made more difficult, and a general library will take considerable skill to produce. This virtually rules out a single, standard, general set of subprograms able to read every df dataset appearing in the near future.

This strategy has the disadvantage of cutting off many potential users who are unwilling or unable to delve into the bit-level format specification to implement their own I/O software. In the long term, though, it prevents the format from being hobbled.

One promising possibility for the short term nevertheless appears when one notes that it is possible to discern from any df dataset in a mechanical fashion how it may be read. Thus, it should be possible to write a program which, after scanning a sample dataset, would then construct a subroutine to read any dataset similarly structured. Given that complicated arrangements of bad-data flag values, data packing schemes, and data array index orderings tend to be uniform across groups of datasets, one custom-generated subroutine could be used to read any of a large group of datasets, and a subroutine-writing program may prove quite profitable.

In the longer term, a general library of I/O subroutines for the df format is desirable and should be written. But the authors feel it is better to wait and let the best implementations rise to the top, than to impose an arbitrary software package with misfeatures everyone will later come to regret.


next up previous contents
Next: Numeric Codes Up: Implementation Notes Previous: Implementation Notes   Contents
Eric Nash 2003-09-25