next up previous contents
Next: The ``df'' format Up: Issues to be Addressed Previous: How Do We Know   Contents

Reconciliation

We have seen that standard file formats have both advantages and disadvantages. By choosing format characteristics which maximize the advantages and minimize the disadvantages, one can arrive at a format which is usable in almost all circumstances.

The trouble is, though, that many of the advantages (such as self-documentation) are in direct conflict with reducing the disadvantages (such as reducing the file sizes). One must therefore choose some sort of tradeoff between conflicting goals.

It is tempting under these circumstances to simply choose a tradeoff somewhat arbitrarily and force it upon the users. Many users--those whose needs the format does not meet--will walk away. Worse, others will design a different format based on a different set of tradeoffs, and one ends up with a whole series of (incompatible) ``standards.'' This is exactly the situation which scientists face today.

Several scientific communities have proposed their own standard formats [Ramirez, 1991], [Pullen, 1990], but for wider interaction and exchange of data between disciplines, a more general format is needed.

If a standard format is to be successful, it must be used on a large number and a wide variety of datasets. One should not restrict its use to a small set of toy-like data files viewed and manipulated on personal computers (PCs) only. Likewise, it would be foolish to design a format to work only on massive databases on large mainframe computers. Rather, we need a format which is usable across the board, since files from PCs will be submitted to supercomputers, and small pieces of supercomputer datasets will be split off and sent to individuals using PCs.

The standard format must be flexible enough, then, for users to make their own strategic choices between efficiency and portability, between self-documentation and file size, and between the ease of using standard I/O subroutines and the performance increase gained with custom software. A high degree of user-specifiability is necessary, then, within a basic core format, or framework, specified by a central authority. This includes the ability for a user to include any data which needs to be included.

Again, a user must be able to say anything he or she needs to say, but all users should speak the same ``language.''


next up previous contents
Next: The ``df'' format Up: Issues to be Addressed Previous: How Do We Know   Contents
Eric Nash 2003-09-25