Research Articles

Language documentation and archiving, or how to build a better corpus

  • Heidi Johnson


Archives have historically played a central role in the description of endangered languages. This is not surprising, since there is little sense in collecting data on languages that are disappearing if there is no plan for preserving that data. Archiving materials for the already nearly extinct languages of North America was an essential goal of the pioneers of Americanist linguistics: Franz Boas, Edward Sapir, and their intellectual descendants. They diligently deposited all their field notes (and later, audio recordings) in archives and museums such as the Smithsonian Institution. There are already several digital archives for endangered language materials ready to receive the documentation being produced today, and to digitize and archive legacy materials from previous decades. The Digital Endangered Languages and Musics Archive Network (DELAMAN) has been formed to co-ordinate efforts and thus improve service to the field. The workshop at which this paper was originally presented was one result of DELAMAN’s collaboration. I hope that this brief guide to corpus management will help ensure that these unprecedented quantities of materials documenting endangered languages are indeed accessible for speakers and researchers for generations to come.

Keywords: language documentation, archiving, endangered languages, digital archives, corpus management, guide, accessibility, Digital Endangered Languages and Musics Archive Network, DELAMAN

How to Cite:

Johnson, H., (2014) “Language documentation and archiving, or how to build a better corpus”, Language Documentation and Description 2, 140-153. doi:

Download PDF



Published on
31 Jul 2014
Peer Reviewed