Standardization of Databases; Data Assessment;
and Uncertainty Statements
Peter L. Smith (Panel Chair)1,
Linda Brown2,
John Rumble3, Hiroyuki Tawara4
1Harvard-Smithsonian CfA, Cambridge, USA
2Jet Propulsion Laboratory, Pasadena, USA
3National Institute of Standards and Technology, Gaithersburg, USA
4National Institute for Fusion Science, Nagoya, Japan
Reproduced with permission from Atomic and Molecular Data and
Their Applications
edited by P.J. Mohr and W.L. Wiese
© 1998 American Institute of Physics, New York, Conference Proceeding #434
The usefulness of databases comprising atomic and molecular parameters
depends upon their reliability, completeness,
and ease of use. The classic database, a book with limited availability
containing data critically evaluated by experts, is gradually becoming obsolete.
Databases today
can be (and are being) prepared, distributed, and utilized by anyone through the
World Wide Web (WWW). Producers and users of databases can be expert or
naive. Therefore standards for databases are required in order
to ensure quality.
Today, databases are, for historical reasons, often discipline-specific. With
standardization of database content, format, and assessment,
expert software will be able to merge small, separately created and maintained,
datasets to create large "virtual" databases capable of dealing with
disparate users with multiparameter requests.
Without standards, users get flawed, incomplete, unreliable, and/or out-of-date
data. There is no configuration control; data can change from day to day
without notice. Data in different formats cannot be merged.
Standards - a broadly agreed upon, public set of rules for data format,
database content, data assessment and uncertainty statements, and
documentation - ensure that database users get reliable information.
With standards, producers can concentrate on content not form.
Documentation is an important part of database standards. Documentation
must include information for the most naive legitimate user about
the sources (i.e., references), accuracies, and completeness of
the data, as well as information about the evaluation process. Documentation
should also explain how related databases differ.
Standards, however, cost money, and the effort required for production
of quality databases is often not appreciated by scientific users, in part
because the standards, the compilation and assessment procedure, and the
data themselves are often not considered to be a very interesting, exciting
scientific endeavor.
Assessment is a particularly time consuming and thus expensive part of database
production. It is
not peer review, but an independent evaluation by experts who assess the
methods used to generate the data, and who also intercompare results
obtained using different techniques. Standards are required to ensure that
assessments of different data sets are consistent. Since standards need to be
based on broad agreements within the community, the ICAMDATA conference series
may be a good vehicle to advance the standardization process.
Return to: ICAMDATA home