Research Articles

Reconceiving metadata: language documentation through thick and thin

  • David Nathan
  • Peter K. Austin


Metadata can be described as ‘data about data’. As a result of recent activities and discussion regarding documentation of endangered languages through projects such as OLAC (Open Language Archives Community) (Simons and Bird 2003) and IMDI (ISLE Meta Data Initiative), metadata within language documentation is now coming to be understood as information that is attached to a file or document for cataloguing purposes (see Johnson, this volume). We call this focus on cataloguing metadata ‘thin metadata’. It runs the risk of not only being a simplistic view of the role of metadata in language documentation, but also, in the longer term, is likely to limit the accomplishments of the field. Thus, a richer, “thick metadata” approach that operates at all levels of linguistic analysis should be central to our field. What has emerged is a “metadata gap”; on the one hand we find minimalist cataloguing schemas promoted for the endangered languages field, and on the other are the rich descriptions that fieldworking linguists write as they create and analyse their data. What is needed to support language documentation is a metadata methodology that provides flexible, richly articulated knowledge representation schemas to encode linguists’ cascading layers of data and metadata.

Keywords: language documentation, endangered languages, metadata, cataloguing, linguistic analysis, methodology, flexibility, priorities, communities of interest, knowledge, resources, expression, accessibility

How to Cite:

Nathan, D. & Austin, P., (2014) “Reconceiving metadata: language documentation through thick and thin”, Language Documentation and Description 2, 179-188. doi:

Download PDF



Published on
31 Jul 2014
Peer Reviewed