Columbia International Affairs Online
CIAO DATE: 8/5/2007
Data DNA: The Next Generation of Statistical Metadata
2007 March
Abstract
Statistical metadata is commonly defined as data about data. Metadata documents information about a statistical dataset’s background, purpose, content, collection, processing, quality, and related information that an analyst needs to find, understand, and manipulate statistical data. As such, the metadata for a statistical dataset broadens the number and diversity of people who can successfully use a data source once it is released. It is the purpose of this paper to discuss issues related to the development and use of statistical metadata and to describe resources to standardize and automate statistical metadata. While there are many types of metadata – this paper is concerned only with statistical metadata.
This paper describes components of a complete statistical metadata system as well as critical elements of basic information for a statistical metadata system. It also reviews the tools that are available now or that could reasonably be developed to create and structure metadata for better access and understanding of datasets by diverse users. Currently lacking in the field of data collection are incentives for shifting the creation of statistical metadata from a costly burden to a benefit; this paper addresses possible incentives and suggests ways to integrate metadata into existing and developing datasets. Finally, this paper describes implications of the tools and related cautions for the National Infrastructure for Community Statistics (NICS).