Is Metadata Too Subjective?
Jim Martino, PhD
Digital Pedagogy Specialist
The Sheridan Libraries
Johns Hopkins University
One of the most important uses of text based metadata is to enable discovery of resources. However, even the best metadata provided at the time of the resource's creation is a kind of assertion [5] whose expression depends on the time, place and author of the metadata creation. Words used to describe topics, terms and even subject areas may shift over time, or may otherwise vary from place to place (or even from person to person). Some metadata elements, like author, creation date and title seem reasonably immune from this kind of shifting. This is what makes searching by title or author reliable, even over time and across different locations. Because users searching for things like learning objects probably have a particular use or context in mind, they are likely to use descriptions of their context in a search query in order to discover appropriate resources. Differences in these more subjective descriptions by the metadata tagger and the user may cause relevant resources to be missed in a search. A more basic worry is that metadata originating from different providers may vary in term usage, quality, and practice.
There has been considerable interesting research into methods of inferring context of resources using information external to the resources themselves (see e.g. [4],[1]. For example, the creation of a hyperlink from one web page to another is an assertion of connection between the pages. While the mere existence of a single link between two pages does not tell us much about the nature of this relationship, examining the link structures of a large collection of web pages may allow us to imply something more about the nature of the relationships among them. Internet search engines depend on this kind of analysis to help make assertions about authority of web-based sources [3]. If resources are accessed through a web portal with the capability of taking into account the individual information of readers (think Amazon.com), more detailed measures of relevance of materials for a particular type of user or purpose may be possible .
These automated methods for inferring context from external information will be most successful if there is a high density of relationships or links, which occurs only after a large number of these connections have been generated [1]. For smaller or newer collections these methods naturally seem less reliable, and text-based metadata may have to be relied upon to a greater degree. There have been efforts to improve the quality of service to users by tailoring or transforming metadata in ways which will optimize the usefulness of the presentation of the metadata [2].
Some negative aspects of the use of metadata should be mitigated for medical education learning objects. For example, term shift can be minimized through controlled vocabularies. Shift over time may be less of a problem for medical education learning objects, as the validity (for accreditation purposes) of many of these objects may be of relatively short duration. However, as the learning object collections grow in both size and complexity, we can expect that at least some of the above issues will have greater impact on resource discovery, and that we will need to design appropriate automated tools to help address these issues.
References
- Fisher, Michelle and Everson, Richard. When are links useful? Experiments in text classification. Advances in Information Retrieval. 25th European Conference on IR Research, ECIR 2003.
- Hillman, Diane, Dusshay, Naomi and Phipps, Jon. Improving Metadata Quality: Augmentation and Recombination. Submitted to Dublin Core 2004. http://www.cs.cornell.edu/naomi/DC2004/MetadataAugmentation--DC2004.pdf [Last accessed 9 October 2008].
- Klienberg, Jon M. Authoritative Sources in a Hyperlinked Environment. Journal of the Association for Computer Machinery 46:5 (September 1999), pp 604-632.
http://www.cs.cornell.edu/home/kleinber/auth.pdf [Last accessed 9 October 2008].
- Kurtz, Michael J. et al. The NASA Astrophysics Data System: Sociology, Bibliometrics, and Impact. Submitted to The Journal of the American Society for Information Science and Technology (2003)
http://cfa-www.harvard.edu/~kurtz/jasist-submitted.pdf [Last accessed 9 October 2008].
- Lynch, Clifford. When Documents Deceive: Trust and Provenance as New Factors for Retrieval in a Tangled Web. Journal of the American Society for Information Science 52:1 (January 2001), pp.12-17. http://www.cs.ucsd.edu/~rik/others/lynch-trust-jasis00.pdf [Last accessed 9 October 2008].