Standardization of Databases; Data Assessment;
and Uncertainty Statements

Peter L. Smith (Panel Chair)1, Linda Brown2,
John Rumble3, Hiroyuki Tawara4

1Harvard-Smithsonian CfA, Cambridge, USA
2Jet Propulsion Laboratory, Pasadena, USA
3National Institute of Standards and Technology, Gaithersburg, USA
4National Institute for Fusion Science, Nagoya, Japan

Reproduced with permission from Atomic and Molecular Data and Their Applications
edited by P.J. Mohr and W.L. Wiese
© 1998 American Institute of Physics, New York, Conference Proceeding #434

The usefulness of databases comprising atomic and molecular parameters depends upon their reliability, completeness, and ease of use. The classic database, a book with limited availability containing data critically evaluated by experts, is gradually becoming obsolete. Databases today can be (and are being) prepared, distributed, and utilized by anyone through the World Wide Web (WWW). Producers and users of databases can be expert or naive. Therefore standards for databases are required in order to ensure quality.

Today, databases are, for historical reasons, often discipline-specific. With standardization of database content, format, and assessment, expert software will be able to merge small, separately created and maintained, datasets to create large "virtual" databases capable of dealing with disparate users with multiparameter requests.

Without standards, users get flawed, incomplete, unreliable, and/or out-of-date data. There is no configuration control; data can change from day to day without notice. Data in different formats cannot be merged.

Standards - a broadly agreed upon, public set of rules for data format, database content, data assessment and uncertainty statements, and documentation - ensure that database users get reliable information. With standards, producers can concentrate on content not form.

Documentation is an important part of database standards. Documentation must include information for the most naive legitimate user about the sources (i.e., references), accuracies, and completeness of the data, as well as information about the evaluation process. Documentation should also explain how related databases differ.

Standards, however, cost money, and the effort required for production of quality databases is often not appreciated by scientific users, in part because the standards, the compilation and assessment procedure, and the data themselves are often not considered to be a very interesting, exciting scientific endeavor.

Assessment is a particularly time consuming and thus expensive part of database production. It is not peer review, but an independent evaluation by experts who assess the methods used to generate the data, and who also intercompare results obtained using different techniques. Standards are required to ensure that assessments of different data sets are consistent. Since standards need to be based on broad agreements within the community, the ICAMDATA conference series may be a good vehicle to advance the standardization process.

Return to: ICAMDATA home