Metadata in Medicine
George Kim, MD, FAAP
Fellow, Division of Health Sciences Informatics
Johns Hopkins University School of Medicine
What is Metadata?
Metadata is description of an information-bearing object (such as a document, data set, database, image, audio file, etc.). While this may be any type of description, the term metadata generally refers to structured text descriptions (in machine-understandable format) of attributes of resources or information-bearing objects (IBOs) for use and communication by computers across networks (such as the Internet). Metadata is used to classify, index, retrieve and process information about IBOs with the intent of streamlining communication and interoperability among federated organizations and systems by establishing common frameworks and languages.
Metadata Types
Metadata is generally grouped into three types:
- Descriptive metadata describes the content of a resource for identification, searching, and retrieval. Examples of this type are the bibliographic information and abstract for a journal article and the coded diagnoses for a patient contained in a medical record.
- Structural metadata describes the architecture and relationships of the different sections of a resource for the purposes of navigation. Examples of this type are the table of contents, page numbers, and index of a journal or the types of reports (laboratory, imaging, consultant) for a patient encounter contained in a medical record.
- Administrative metadata describes technical aspects of an information resource for processing and management. Examples of this type are the publishing information about a printing of an issue of a journal and the privacy, confidentiality and security rules associated with handling a medical record.
Metadata Structure
Metadata is processed by computers and transmitted electronically across networks such as the Internet. In order for metadata to be used and communicated consistently among different systems (interoperability), its structure must be standardized for different computers and networks. Standardization of metadata for an application or process involves specification of its structure and agreement of the specification by stakeholders at many levels of detail (with examples): character encoding (Unicode, ASCII, etc.), language (English, Japanese, etc.), controlled vocabulary (MeSH, SNOMED CT, etc.), message structure (DICOM, HL7, etc.) and formatting (ASN-1, XML, etc.).
Following is a partial example of a metadata record that describes a journal article within the MEDLINE/PubMed medical citation database in eXtensible Markup Language (XML.) In this XML example, the IBO is a journal article. Each metadata attribute of the article such as Article Title, Abstract Text, Publication Type, or PubMed Identifier (PMID) is surrounded by a set of bracketed tags that adhere to a defined and agreed upon structure (the XML schema). Values for attributes can be free-text, numerical, from a specified list, or a controlled vocabulary (that is, a thesaurus, dictionary, or other organized list).

Metadata Use in Medicine
Metadata is used to organize and process data and information for increased usability and interoperability among different organizations and their information systems.
In medical care, descriptive metadata (in the form of classifications and codes) define demographics, diagnoses and care of patients for the purposes of documentation, communication, transaction and monitoring. Such metadata is created and used within the medical record by care providers, administrators, insurers and regulatory agencies to communicate and document care and to keep track of transactions, payments and operations. Examples of clinical classifications and codes used include: the International Statistical Classification of Diseases and Related Health Problems (ICD-10), Current Procedural Terminology (CPT) and the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT).
In health care transactions, structural and administrative metadata (in the form of messaging standards) define the formats in which coded data (in textual and non-textual formats (such as imaging data) is created, transmitted and stored for use and archiving. Such metadata specifies the ways that data and information are exchanged between electronic systems) for consistent processing (interoperability). Examples are the Health Level Seven Reference Information Model (HL7 RIM) and Digital Imaging and Communications in Medicine (DICOM).
In medical publishing and librarianship, descriptive, structural and administrative metadata are used extensively to archive and index publications (journal articles, books, electronic media, etc.) for identification and retrieval. Examples of descriptive metadata in this domain include: the Medical Subject Headings (MeSH) and the National Library of Medicine (NLM) Classification System. Examples of structural metadata include: eXtensible Markup Language (XML) for electronic business (eb XML). Examples of administrative metadata include descriptions of the different formats used for storage of multimedia.
How is Metadata Standardized?
Metadata standardization can be complex, involving a variety of organizations, including the following:
- Standards development organizations (SDOs) that specify details of metadata
- Domain (medical) and business organizations that use the metadata
- Regulatory organizations that enforce adherence to metadata standards
What Types of Medical Metadata Standards are Used in the United States and Who Creates Them?
Medical care and research
In medical care and research in the United States, the federal government has defined 24 domains in which descriptive, structural, and administrative metadata standards are being defined to streamline communications and promote interoperability among government, business, and health care providers. The Consolidated Health Informatics initiative (part of the eGovernment Initiative) defines vocabularies and messaging standards to be used by all parties involved in health care. Many metadata standards have been in use for many years and are part of the specification. Adopted standards include the HL7 Reference Information Model, SNOMED CT, and LOINC.
The Institute of Medicine (IOM) is in the process of specifying metadata to describe contents of electronic health records. The specification process is being used to extend the use of the electronic health record into the domains of public health and research.
The Health Insurance Portability and Accountability Act (HIPAA) creates standards for privacy, confidentiality and security for protected health information that affect the implementation and use of metadata in health care.
Medical knowledge
Medical knowledge is produced as publications (books, journals, electronic resources) that are indexed and archived in databases such as MEDLINE (at the National Library of Medicine (NLM)) that specifies publication metadata. Controlled vocabularies used for descriptive metadata include the Medical Subject Headings (MeSH) and the NLM Classification System.
The Unified Medical Language System (UMLS) is a concordance of over 100 medical vocabularies that is being used to provide automated support for indexing and metadata assignment to medical text documents.
Examples of Medical Metadata Use
Controlled vocabularies
- Medical Subject Headings (MeSH) was created and is maintained by the National Library of Medicine (NLM) for indexing and retrieval of citations from the MEDLINE citation database. For more information, refer to http://www.nlm.nih.gov/mesh/.
- The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) was created and is maintained by the College of American Pathologists (CAP). It is available (free to developers) through NLM’s Unified Medical Language System (UMLS) for general use in documentation and communication of clinical medicine. For more information refer to http://www.ihtsdo.org/snomed-ct/.
- The Logical Observation Identifiers Names and Codes (LOINC®) is a clinical terminology important for laboratory test orders and results. Produced by the Regenstrief Institute in Indianapolis, LOINC is a universal standard for identifying laboratory observations. For more information, refer to http://www.loinc.org/.
- The International Classification of Diseases (ICD-10) is used to code and classify mortality data from death certificates and the International Classification of Diseases, Clinical Modification (ICD-9-CM/ICD-10) is used to code and classify morbidity data from the inpatient and outpatient records, physician offices, and most National Center for Health Statistics (NCHS) surveys. For more information refer to http://www.cdc.gov/nchs/icd9.htm.
Messaging Standards
- Health Level 7 (HL7) uses an object-oriented development methodology and a Reference Information Model (RIM) to create messages. For more information refer to http://www.hl7.org.
- Digital Imaging and Communication in Medicine (DICOM) facilitates interoperability of medical imaging equipment by specifying: imaging device protocols, command syntax and semantics, media storage services, file formats and directory structure to facilitate access to the images and related information stored on interchange media. For more information, refer to http://medical.nema.org/.
Metadata standards organizations
- Consolidated Health Informatics (CHI) establishes a portfolio of existing clinical vocabularies and messaging standards enabling federal agencies to build interoperable federal health data systems with private health care information networks. For more information, see http://www.hhs.gov/healthit/chi.html.
- Health Level 7 (HL7). In addition to being a messaging standard, HL7 is an ANSI-accredited SDO, operating in the healthcare arena to create flexible, cost effective approaches, standards, guidelines, methodologies, and related services for interoperability between healthcare information systems. For more information, see http://www.hl7.org.
- The Dublin Core Metadata Initiative (DCMI) is an organization dedicated to promoting the widespread adoption of interoperable metadata standards and developing specialized metadata vocabularies for describing resources that enable more intelligent information discovery systems. For more information, see http://dublincore.org.
- The World Wide Web Consortium (W3C) specifies metadata structures for use on the World Wide Web. Several initiatives that are being explored or developed include the Semantic Web, the Resource Definition Framework and Web Services. For more information, see http://www.w3.org.
- The MedBiquitous Consortium develops information technology standards for medical education and training including metadata standards for medical learning objects. For more information, see http://www.medbiq.org.
For Further Reading
American Medical Association. CPT: Current Procedural Terminology.
Provides information on the medical codes used in documentation and billing.
URL: http://www.ama-assn.org/ama/pub/category/3113.html [Last accessed 14 October 2008].
American National Standards Institute. ANSI Standards Activities.
A private, non-profit organization that administers and coordinates the U.S. voluntary standardization and conformity assessment system.
URL: http://www.ansi.org/standards_activities/overview/overview.aspx?menuid=3 [Last accessed 14 October 2008].
Centers for Disease Control and Prevention. Public Health Data Standards Consortium.
A confederation to develop, promote, and implement data standards for population health practice and research.
URL: http://phdatastandards.info/ [Last accessed 14 October 2008].
Institute of Electrical and Electronic Engineers. IEEE Standards Association.
The leading developer of global industry standards in a broad-range of industries, including biomedical and healthcare, information technology and information assurance.
URL: http://standards.ieee.org/ [Last accessed 14 October 2008].
Kupfer DJ, First MB, Regier DA. A Research Agenda for DSM-V. American Psychiatric Association, 2002.
Explores basic nomenclature issues, including the desirability of rating the quality and quantity of information available to support psychiatric diagnosis.
URL: http://www.appi.org/book.cfm?id=2292 [Free, last accessed 14 October 2008].
Library of Congress. Library of Congress Digital Repository Development - Core Metadata Elements. May 11, 2004.
Identification of a core set of metadata elements to be used in the development, testing, and implementation of multiple repositories.
URL: http://www.loc.gov/standards/metadata.html [Last accessed 14 October 2008].
National Library of Medicine. The Unified Medical Language System.
The purpose of the UMLS® is to facilitate the development of computer systems that behave as if they understand the meaning of the language of biomedicine and health.
URL: http://www.nlm.nih.gov/research/umls/ [Last accessed 14 October 2008].