Research Articles

Swadesh lists are not long enough: Drawing phonological generalizations from limited data

  • Rikker Dockum
  • Claire Bowern


This paper presents the results of experiments on the minimally sufficient wordlist size for drawing phonological generalizations about languages. Given a limited lexicon for an under-documented language, are conclusions that can be drawn from those data representative of the language as a whole? Linguistics necessarily involves generalizing from limited data, as documentation can never completely capture the full complexity of a linguistic system. We performed a series of sampling experiments on 36 Australian languages in the Chirila database (Bowern 2016) with lexicons ranging from 2,000 to 10,000 items. The purpose was to identify the smallest wordlist size to achieve: (1) full phonemic coverage for each language, and (2) accurate phonemic distribution compared to the full dataset. We hypothesize that when these two criteria are met they represent a minimally complete sample of a language for basic phonological typology. The results show coverage is consistently achieved at an average lexicon size of approximately 400 items, regardless of the original lexicon size sampled from. These results hold broad significance, given the predominance of word lists smaller than 400 items. For fieldwork, this study also provides a guideline for designing documentation tasks in the face of limited time and resources. These results ...

Keywords: Swadesh list, Phonological generalizations, Data, Experiments, Australian languages, Chirilla database

How to Cite:

Dockum, R. & Bowern, C., (2019) “Swadesh lists are not long enough: Drawing phonological generalizations from limited data”, Language Documentation and Description 16, 35-54. doi:

Download PDF



Published on
31 Aug 2019
Peer Reviewed