‘Which Language Should I Document? Some concrete suggestions from diversity and endangerment’ by Harald Hammarström

By Francesca Brown|December 17, 2019|Uncategorized|1 comments

Which Language Should I Document? Some concrete suggestions from diversity and endangerment

Harald Hammarström, December 1, 2019

Are you looking to do a description of an endangered un(der)documented language but you are not sure which to choose? Then this may be the document for you. Let us take as point of departure the idea that we want to prioritize to document (i) the most endangered language which is (ii) not already described and whose documentation would bring the largest expected contribution to our knowledge of linguistic diversity. For (i), we’ll use the Agglomerated Endangerment Status (AES) which is a combination of the ElCat, Ethnologue, and UNESCO databases (for details, see Hammarström et al. 2018). These data are far from flawless but hopefully serves the present purpose well enough. AES comes on a scale of from Not Endangered to Extinct which I have translated to AES Priority (AESP) as per the below. The more endangered the higher priority, with the lowest (zero) priority being given to non-endangered languages. We exclude the already extinct languages.

For (ii), on a global level, all we have access to are the crude description types in the bibliography of Glottolog (Hammarström et al. 2019). Suppose we use the following point system for different kinds of language descriptions:

We could then look at all the publications tied to a language and find its Most Extensive Description (MED) according the above hierarchy. For example, if there is a grammar and a dictionary and several overviews available for a language, its MED will be the grammar. It does not make a difference if there is one, two, or one hundred grammar sketches, as long as there is no document with a higher ranked category, the MED of the language will still be grammar sketch (for further details, see Hammarström et al. 2018). It is important to note that MED is primarily a measure of grammatical description as opposed to lexical, textual, audio or visual description/documentation. (But we will try to take documentation into account on an ad-hoc basis in the below.)

Now, we will assess the expected gain in diversity knowledge (EDK) by looking at the genealogical position of a language. Since we are trying to assess languages which have not been described yet, this is essentially the only option. The principle implies that the documentation of an undescribed dialect of English has little expected diversity knowledge gain, while the opposite holds for the documentation of a language isolate or a language from a family where no language is described. More precisely, we first calculate the phylogenetic average description level, i.e., by taking the average of the branches recursively (using the Glottolog (Hammarström et al. 2019) classification). Then we calculate the same thing but supposing the language at hand is described by a long grammar. The difference between the two is the expected gain in diversity knowledge.

There are many interesting ways to combine AESP and EDK, but the simplest is to rank languages according their product:

Priority(L) = AESP(L) · EDK(L)

Priority(L) thus gives a value between 0 and 20 (4 · 5 = 20) for every language.

For example, Atohwaim [aqm] is a 6b (Threatened) language of Indonesian Papua. It belongs to the small Kayagar family with three members. The classification is Atohwaim in its own branch, and the other two, Tamagario [tcg] and Kayagar [kyt], forming the second branch. All three languages have only a wordlist to their documentation. Obtaining a grammar of Atohwaim would thus raise the average weighted description level from 1.0 to 3.0, yielding an EDK of 3.0−1.0 = 2.0. Its priority value isthus 1·2.0 = 2.0, earning it rank #165 on the full priority list.

As another example, Schiermonnikoog Frisian [-] is 8b (Nearly extinct) and the subject of a short grammar (Fokkema 1969). It belongs to the Frisian subfamily of Germanic which is in turn a branch of Indo-European. Since we know a great amount about the languages of these (sub-)families, getting a fuller grammar of Schiermonnikoog Frisian would only give a tiny increase in diversity knowledge, EDK≈ 0.000154. The priority score of this language thus comes to 4·0.000154 ≈ 0.000617, earning it rank #1813 on the full list.

The calculation used simplifies a number of things, which, fortunately, have been somewhat discussed in the literature. Harmon and Loh (2010) propose a diversity index. Hammarström (2010) provides a priority list on similar principles as here but focussing on the case where there is no grammar for the every language of a family. Hauk and Heaton (2018), too, look at documentation priority in a similar manner, but have a much poorer command of the data. Hammarström et al. (2018) describes a web tool for browsing endangerment and description data in various combined ways. Back issues of the journal Bulletin of the International Committee on Urgent Anthropological Ethnological Research provide many ad-hoc cases where language and/or cultural documentation is or was urgent. Seifart et al. (2018) report more qualitatively on the contributions in the last quarter century to our knowledge of the languages of the world. Sands (2017) is a good survey of Africa specifically.

The full priority list of the 6 820 living L1 languages of the world can be inspected here. Now let us look at the top-100 priority languages. As well shall see, it is not so simple to use this at face value, mainly because of problems — perhaps outright errors – in the endangerment data. I have boldfaced the cases I think are actually well-prioritizable as projects, with appropriate considerations explained below. Fieldwork on Papuan languages can and must be done via Malay/Indonesian on the Indonesian side and Tok Pisin (or English in Western Province) on the PNG side.

Access to the entire list is available as a pdf here.

1 Taruma [tdm], as of fieldwork by Sérgio Meira in 2015, had only one remaining speaker (Mrs Suttie, then 64 years old) who cannot really be described as a fluent speaker. Although only wordlists have been published, there is also some manuscript data, including texts, collected almost a century ago by the late missionary Cuthbert Cary-Elwes (Colson 2011). Thanks to these materials and recent fieldwork, we may expect a grammar sketch to be obtainable already. The top priority rank the betrays the considerable challenge in producing a long grammar from here.

2 Jalaa [cet] in Northeast Nigeria was discovered by Kleinewillinghöfer (2001) who, thanks to great efforts, rescued lexical data (along with some tentative grammatical data) from the memories of this ethnic group. According to Ulrich Kleinewillinghöfer (p.c. 2012) there are no fluent speakers remaining — only rememberers — which implies that it is no longer possible to produce a long grammar, let alone a grammar sketch, of this language.

3 Kembra [xkw] of the lowlands of Indonesian Papua is a mysterious entity. The language exists in various catalogues thanks to Doriot (1991) who met a transient speaker (at Malu, east of Kembra) and took up a short (35-item) wordlist. This wordlist, which remains unpublished, contains low-scoring comparisons with neighbouring languages. Reports of these comparisons have led to the positioning as an isolate or unclassified language in some listings, including my own. After having obtained a copy of the wordlist, however, I believe the matches with Lepki [lpe] and Murkim [rmh] further south — for example the numeral ’two’ — are significant enough to warrant inclusion of Kembra in the Lepki-Murkim family and I will adjust forthcoming editions of Glottolog accordingly. Doriot (1991)’s notes remain the only information on the language, and it is from there the number 30 of Kembra inhabitants emanates. This small number, in turn, has been interpreted as a sign of severe endangerment. But the area is poorly surveyed, and we do not actually know the boundaries of the Kembra language. Indications from fieldwork further south suggest (Andersen 2007:17, 24) a continuity of semi-permanent settlements in-ra, and an old colonial map (Hoogland1940) has Kiambra and Mambra in the vicinity. It is quite possible that the Kembra language is spoken in further settlements, and in such a remote area, the chances for intergenerational transmisson should be high.

4 Baiyamo [ppe], the language spoken at the village Papi in the Upper Sepikarea of PNG, is a true case of a high priority small language whose documentation is feasible, but the endangerment of the language 8a (Moribund) seems exaggerated. Although the speaker number of 70 appears to be in the correct order, as of 2009, the language was still being transmitted to children (p.c. Jack Kennedy 2009).

5 Massep [mvs], near the mouth of the Mamberamo in Indonesian Papua, is similarly a true case of a high priority small language whose documentation is feasible. But here too endangerment of the language 8a (Moribund) seems exaggerated. Whenever surveyed, Massep has been a stably transmitted language with only 30-40 speakers (Clouse et al. 2002:4), and may be the smallest language on the planet with this property.

6 Mor (Bomberai Peninsula) [moq] (Harald Hammarström), 9 Asabano [seo] (Roger Lohmann), 42 Kibiri [prm] (Martin Steer), 49 Tuwari [tww] (Sylvain Loiseau), 94 Haruai [tmd] (Bernard Comrie), 56 Yele [yle] (Stephen C. Levinson) are all worthy Papuan languages (though far from equal in endangerment) but already being documented and it is not clear what additional projects may bring. The same may be said for the South American isolates 78 Cofán [con] (Rafael Fischer and Kees Hengeveld), 79 Candoshi-Shapra [cbu] (Simon Overall).

Another few Papuan language — 7 Kehu [khh], 8 Turumsa [tqm], and 23 Pyu [pby] — should be very worthy projects, but the priority requires a closer look at the endangerment, and the field locations are on the more difficult end of the scale. Turumsa only has a handful of speakers and may already be extinct (Tupper 2007). There are good reasons to believe that Pyu is by now moribund (Hammarström 2010:191) and the location is relatively remote, around Biake 2 at the Upper Sepik bend, mostly on the PNG side. The Kehu were only recently brought into contact with modern society and fieldwork with the Kehu is likely to be challenging (Adipatah 2011; Kamholz 2012). It is difficult to assess the transmission of Kehu even among the 35-40 settled Kehu (Kamholz 2012:246-247).

Over a dozen Papuan languages appear to be perfect targets for documentation projects. They are only wordlist level-documented and are endangered, in the sense of not being transmitted to children, but still have enough speakers that documentation is very meaningful. On the Indonesian side, 19 Sumuri [tcm], 27 Mawes [mgk] 32 Dem [dem], 50 Mer [mnu], 54 Demta [dmy], 72 Koneraw [kdw] are in perfectly accessible locations, even if not always convenient. 16 Usku [ulf], 25 Molof [msl], and 26 Elseng [mrf] are in more remote locations, but should nevertheless be possible. On the PNG side, 10 Yerakai [yra], 37 Odiai [bhf], 38 Ambakich [aew], 89 Mwakai [mgt], 90 Pondi [lnm], 91 Mekmek [mvk], 95 Owiniga [owi], are all in the Sepik area and should be accessible, even if not convenient. Mwakai [mgt] and Pondi [lnm] actually have less expected diversity knowledge gain than the formula would suggest because of the recent extensive grammar of their close relative Ulwa [yla] (Barlow 2018). A non-negligible amount of data may exist among missionaries for languages like Odiai, Elseng and Owininga, but if so, it is far from publication.

The urgency is somewhat lower with another seven undocumented languages of the remoter parts of the Jayapura hinterland in Indonesian Papua: 11 Yetfa [yet], 21 Kimki [sbt], 22 Sause [sao], 30 Kosadle [kiq], 46 Abinomn [bsa], 67 Lepki [lpe], 70 Kapori [khp]. 33 Dibiyaso [dby] in Western Province of PNG could be added to this list. Despite their assessment as 7 (Shifting) in AES, there are good reasons to believe all of them are being transmitted to children (Hammarström 2010).

With yet another five Papuan languages 29 Kol (Papua New Guinea) [kol], 80 Burmeso [bzu], 81 Kaure-Narau [bpp], 82 Pele-Ata [ata], 83 Anem [anz], the endangerment priority is there but not extremely acute, and there exists sketch level documentation. Nevertheless, fuller documentation would undoubtedly be very worthy and timely projects. Burmeso and Kaure in the interior of Indonesian Papua have more challenging field locations than the three in the Bismarck islands of PNG.

A dozen more Papuan languages are in a similar situation, with only slightly lower priority. 36 Bogaya [boq], 41 Damal [uhn], 52 Baibai [bbf], 60 Touo [tqu], 61 Kaki Ae [tbd], 64 Pawaia [pwa], 65 Yale [nce], 73 Purari [iar], 77 Guriaso [grx], 84 Amto [amt], 98 Siawi [mmp], are on the PNG side and 51 Momina [mmb], 86 Mpur [akc], 97 Momuna [mqf] on the Indonesian side. All have grammar sketches (in several cases unpublished missionary sketches) and could be considered endangered in a wider perspective, but not immediately so. All are accessible, and Mpur [akc] is especially convenient from Manokwari. Momuna and Momina as well as Amto and Siawi are sister languages respectively.

12 Yámana [yag], 17 Taushiro [trr], 18 Tinigua [tit] are three South American isolates known to be down to the last speaker. 35 Ongota [bxe] of Ethiopia has a handful of speakers left, but is otherwise in a similar position. Fortunately, all four have been the subject to recent relatively extensive work by different teams, and it is unclear what an additional project may bring.

The Australian languages 13 Wulna [wux], 14 Wurrugu [wur], 28 Laragia [lrg], 43 Margu [mhg], 39 Wanyi [wny], 47 Yangman [jng], 48 Gajirrabeng [gdh], 55 Mangerr [zme], 58 Kamu [xmu], 93 Tyaraity [woa] listed in AES as 8b (Nearly extinct) are effectively extinct. At most, rememberers know isolated words, but it is not possible to do any projects towards obtaining further grammatical data. 40 Wadjiginy [wdj] may still be alive, but since both the Pungupungu and Wadjiginy varieties have substantial competent sketches, it is unclear to me how valuable further work with the last speakers may be.

The enigmatic 15 Kujarge [vkj] of Chad would be highly interesting from a number of perspectives (Hammarström 2010:184). The area is exceedingly difficult to access, but there is good reason to believe Kujarge individuals still exist even after Janjaweed attacks (Blažek 2015:88).

Similarly, 20 Shom Peng [sii] of the Nicobars of India are difficult to access, especially for non-Indian passport holders. Once in Great Nicobar Island, it is possible to meet Shom Pen individuals in coastal villages (p.c. Simron Singh 2019) and the language appears not to be endangered in the interior.

Four South American isolates 24 Muniche [myr], 31 Itonama [ito], 34 Cayubaba [cyb], 76 Guató [gta] are practically extinct or nearly so and have sketch-level descriptions already. Projects to do further documentation with rememberers have limited added value to offer. The same, but in stronger form, holds for 88 Chaná of the small Charrúan family in Uruguay.

44 Huachipaeri [hug] in Peru has close relatives with substantial documentation. Huachipaeri is accessible, but I do not know the details of the potential added value of a documentation project. Similar things can be said for 57 Pumé [yae] in Venezuela, though Pumé has a much larger (ca 8 000) number of speakers.

The 8a (Moribund) Namla [naa] language is preferable studied together with its sister language 130 Tofanma [tlg]. Documenting the two should give excellent value (Hammarström 2010:190-191) and though the location is relatively remote, recent studies suggest documentation is feasible (Mariati S. 2014) without giving the actual fieldwork location.

The Indonesian Papuan languages 45 Duriankere [dbn], 62 Suabo [szp], 66 Marori [mok], 99 Smärky Kanum [kxq] appear to be of priority. But Duriankere is since long thought to be extinct (de Vries 2002:88) and is relative Suabo [szp] is the subject of a competent modern sketch (de Vries 2004). Marori [mok] and Smärky Kanum [kxq] were the focus of an ELDP grant MDP0336.

The only North American languages to make the list are 63 Patwin [pwi] and 87 Achumawi [acv] from northeast California. Patwin was said to have one remaining speaker in 2003 (Golla 2011:145) and Achumawi by a handful of perhaps not fluent speakers (Nevin 1998:7, Golla 2011:98). It is not known to me if any are still alive and if so how meaningful a documentation project would be against the background of the rather extensive unpublished data already gathered on both languages.

Two famous language isolates — 71 Kusunda [kgg] of Nepal and 74 Hadza [hts] of Tanzania — make appearance on the priorty list. Kusunda is down to the last few speakers are the subject of a grammar sketch and considerable further on-going documentation. Hadza has, after a century of relative independence, started to become threathened. It has a long history of documentation but so far no extensive grammar has been published. Curiously, no student who has embarked on a PhD study of the language has come to a finish, as if a curse befalls those who try.

The 59 Berta [wti] language of the Sudan-Ethiopia borderland has substantial description but could benefit from more complete documentation, especially as it concerns its geographical variation. Such a project would be feasible but not necessarily easy.

Three African languages 69 Lafofa [laf], 85 Amdang [amj], 100 Tese [keg] are threatened and prioritizable on account of their genealogical positions. Lafofa and Tese are in the Nuba mountains and should be accessible save for a reinception of civil war. It is likely that fieldwork can be done by means of English as well as Sudanese Arabic. Amdang is more difficult to reach but is not endangered (Wolf 2010). Fieldwork can be done in Chadian Arabic and perhaps also French.

The majority option is that the 75 Aka-Hruso [hru] language of Arunachal Pradesh in India is a Sino-Tibetan language. If so, it would earn it less diversity value, but still retain some, since its position within Sino-Tibetan remains to be better understood. In my opinion, a contact explanation for the Sino-Tibetan lexical material in Hruso is arguable, leaving the underlying Hruso an isolate. Either case poses some added interest in more extensive analysis and documentation of Hruso. Scattered materials are already available and there is some on-going work (by Gregory D. S. Anderson among others). Fieldwork can be done in English, Hindi and Miji.

Three endangered Papuan languages 68 Lilau [lll], 92 Yawiyo [ybx], 96 Mekwei [msf] have close relatives of substantial or on-going description, and as such actually have less expected diversity gain than the formula (oblivious to family time-depth) would suggest.

Perhaps the most outstanding fact about the priority list calculated this way is the near total dominance of Papuan languages. This is not an error, not an exaggeration and not a joke — it is the actual outcome of the diversity gain and endangerment principles combined as above. Going further down the list would be more of the same: 156 Doso [dol], 188 Ari [aac], 194 Turaka [trh], 316 Banaro [byz], 306 Gorovu [grq], languages of the small families Tor, Kayagaric, Kolopom, Geelvink Bay, South Bird’s Head, Pauwasi and the Far Western Lakes Plain subfamily, and several Torricelli subfamilies deserve special mention.

But genealogical diversity is not everything. So let me also flag a number of other unknowns unattented and desperate for documentation, without pretense that this is an exhaustive list.

In the Americas, curiously, it is the languages of Mexico that have the lowest average documentation level. Especially the many varieties of Otomanguean languages which are interesting also for their tonal systems. The low average is dependent on language division here as well as elsewhere, but as far as I can tell, the proliferation of, e.g., Mixtec languages in the ISO 639-3 is generally well-founded (Egland et al. 1983). Fieldwork in México can be done in Spanish and perhaps “easier” than in many Asian and African countries, but is not without danger. Two languages of Honduras 136 Pech [pay] and 151 Tol [jic] land relatively high up on the priority list and could certainly benefit from fuller documentation.

In Africa, three small languages of the Eastern Jebel family in Blue Nile province of North Sudan are terminally endangered and documented only with wordlists. They are 190 Molo [zmo], 191 Kelo [xel] 195 Aka [soh] and are located “right on the highway” according to the late Lionel M. Bender (p.c 2009). It would be a tragedy if they were left to die.

In Western South Sudan, a range of languages are known linguistically and ethnographically primarily from the late missionary Stefano Santandrea. From a diversity perspective 166 Aja (Sudan) [aja] and 258 Gbaya (Sudan) [krs], 439 Birri [bvq] (if still alive), Ubangi languages such as Indri, Feroge and Mangaya and languages of the Sere-Bviri group are the most urgent, but it would be of high value to obtain modern documentation from this area more generally. The neglect of this area by modern linguists is no doubt due to the difficult access, which unfortunately continues to this day.

The unknowns continues continues into Northern Congo which is a language-rich meeting point of Bantu languages and various subgroups of Ubangi and Central Sudanic. In particular, more comprehensive and modern documentation of the Mbaic languages, a diverse and geographically scattered subgroup, would be highly valuable not only to better understand its contact effects. The mysterious and forgotten language of Mongoba and Kazibati is possibly extinct already. Perhaps surprisingly, a modern extensive grammar of a Pygmy language of Northeastern Congo or a Pygmoid languages of central Congo is missing and would be highly desirable from an ethnographic perspective. (Smith (1938) — a grammar of Efe written on the basis of a Bible translation provided by a 2nd language speaker — is valuable, but far from than ideal in a documentational perspective.) As with Western South Sudan, the difficult access to the region is a major, but not insurmountable, obstacle.

In West Africa, most languages belong to the largest family on the planet most often known as Niger-Congo. Special mention could be made of 727 Mbulungish [mbv] and 728 Baga Mboteni-Binari [bcg] in Guinea, and 931 Aproumu Aizi in Côte d’Ivoire for their uncertain position in this family. Nigeria hosts an enormous amount of languages with a disproportionately small number of linguists doing fieldwork. Especially the northeastern quarter hosts many endangered languages from dozens of subgroups of Niger-Congo and of West Chadic. A rapidly disappearing profession is black-smithing, and some blacksmith ethnic groups are known to have significantly different languages, e.g., Kpeego in Burkina Faso (Zwernemann 1996) and Kawaway in Chad (Lionnet and Hoïnathy 2015), certainly worthy of documentation projects.

It also wish to mention three little known African hunting and gathering populations about whose languages not even a wordlist has surfaced. Needless to say, this makes everything about their linguistic status unknown. Gebeyehu (2013:5-6) reports a population of some 500 Tamma, distinct from Majang, in the remote jungles of Gureferda District of Bench-Maji zone in South West Ethiopia. As far as I know, this population is not mentioned elsewhere in the literature and data of their (original?) language would be highly desirable. Terashima (1980) is a competent ethnographic account of a hunting and gathering people called Bambote west of the middle of Lake Tanganyika below the Lukuga river, in Congo. (This population is not to be confused with the Bambuti, quite a bit further north.) It is not clear whether the Bambote speak a very distinct language of their own or a variety of the local Taabwa or both Terashima (1980:229-230). Fieldwork can be done in Kingwana Swahili. Doma or Dema [dmx] is a small population of hunter-gatherers in near the where the Zambezi river crosses the border from Zimbabwe to Mozambique (dos Santos Júnior 1944; Hachipola 1998; Hasler 1996; Tamayi 1959; Nicolle 1959). They may speak the neighbouring Bantu languages exclusively or also have a distinct variety of their own.

In South and Central Asia, there are a number of tribal peoples whose languages have attracted little attention from linguists despite interest from the ethnographic side. There are a number of peripatetic peoples of Iran and Afghanistan who are known to speak or have spoken distinct varieties (Rao 1995; Windfuhr 2002) yet have nowhere near the documentation of their European (Romani) counterparts. (Also at least two little studied Sindhi varieties, Luwati of Oman and Kholosi of Iran, are spoken in enclaves away from their origin.) Hunting and gathering peoples where the linguistic study lags far behind the ethnographics include Birhor [biy] (Roy 1925; Osada 1993; Sinha 1972), Hill Korwa [kfp] (Deogaonkar 1986; Rizvi 1977), Van Vagri (who possibly speak a Dhundhari variety of Dhundhari [dhd]) (Misra 1990), Hill Kharia [ksy] (Roy and Roy 1937; Das 1931; Das Gupta 1959), Warre Koya (Dubey 1970), Pardhi [pcl] (Misra and Nagar 1993), Kanjari [kft] (Misra and Nagar 1990) (Munda or Indo-Aryan varieties) and a large number of Dravidian speaking tribes (Luiz 1962; Menon 1996, 1997). The area in South Asia where the language inventory is the least reliable is the “Naga” area of northeast India.

In Southeast Asia, South China holds a large number of languages from different families and subgroups and are rapidly being described by Chinese linguists. Caijia (Bó 2004) is of special interest for its position in the Sino-Tibetan language family and Sanqiao (Yu 2017) for its mixed Dong-Miao character, but it is unknown to me whether substantial efforts are already in place to document them. Laos is the country in Southeast Asia with the lowest average documentation level and may now be more open to outside fieldwork than before. A very endangered high-contact language is Yilan Creole Japanese spoken in Taiwan (Sanada and Chien 2012; Chien and Sanada 2010).

In the Pacific, apart from the plethora of Papuan languages prioritized on account of diversity, many of the remaining Austronesian languages are also endangered and undocumented. In particular, many coastal Austronesian languages of the New Guinea island, as well as Negrito languages of the Philippines and Punan/Penan languages of Borneo are urgent for the additional ethnographic value.

Beyond the spoken L1 norm, there is an enormous need for the documentation of (village) sign languages. Sign languages have a lower average documentation level than even the least documented region of spoken languages. For a concrete list of cases, see Glottolog (Hammarström et al. 2019) — it would be easier to mention the few which are already subject to some description than to list all than are in need of it and endangered. A case of extraordinary interest is the tactile modality of Bay Islands Sign Language (Ali et al. 2017).

Special registers exist in many places of the world (Sun 1999; Tramutoli 2012; McGregor 1989; Grimes and Maryott 1994; Hoenigman 2015; Noss 1977; Moñino 1991; Brindle et al. 2015; Borges 2016; Akinlabi and Ndimele 2012; Urua 2008; Harrisson 1965) and a fair amount has been written about them, but we still lack extensive descriptions. The same can be said for whistled languages (Meyer 2015). It would be very valuable to obtain large collections of annotated data and examples before these practices go out of use.

Lastly, and on the same track, a lot has been written on pidgins, but we still lack even one “full” grammar of a pidgin language. For the most complete list of pidgins, see Bakker and Parkvall (2010); Parkvall and Bakker (2013a,b). It is my impression that of the pidgins still in use, the “easiest” to document would be a migrant worker Arabic pidgin of the Gulf states of Middle East.


Adipatah, Joesoef. 2011. Menelusuri suku terasing di Papua. Article in Darma Sadtri Sunday 9 October 2011.

Akinlabi, Akinbiyi & Ozo-Mekuri Ndimele. 2012. Agbirigba: The birth of an Igboid lect. The Nigerian Linguists Festschrift Series 9. 699–710.

Ali, Kristian, Ben Braithwaite, Ian Dhanoolal & Kimone Elvin. 2017. Documenting language in visual and tactile modalities. Paper presented at the 5th International Conference on Language Documentation and Conservation (ICLDC).

Andersen, Øystein Lund. 2007. The Lepki People of Sogber [sic!] River, New Guinea. Unpublished.

Bakker, Peter & Mikael Parkvall. 2010. Catalogue of Pidgin languages. Paper presented at the second APiCS conference 11-14 Nov, 2010.

Barlow, Russell. 2018. A grammar of Ulwa. University of Hawai’i at Mānoa. Doctoral dissertation.

Blažek, Václav. 2015. On the position of Kujarke within Chadic. Folia Orientalia 52. 75–99.

Bó, Wénzé. 2004. Càijiāhuà Gàikuàng. Minzu Yuwen 2004(2). 68–81.

Borges, Robert. 2016. Kumanti: Ritual language formation and African retentions in Suriname. OSO: Tijdschrift voor Surinamistiek en het Caraïbisch gebied 35(2). 225–245.

Brindle, Jonathan, Mary Esther Kropp Dakubu & Ọbádélé Kambon. 2015. Kiliji, an unrecorded spiritual language of Eastern Ghana. Journal of West African languages 42(1). 65–88.

Chien, Yuehchen & Shinji Sanada. 2010. Yilan Creole in Taiwan. Journal of Pidgin and Creole Languages 25(2). 350–357.

Clouse, Duane, Mark Donohue & Felix Ma. 2002. Survey report of the north coast of Irian Jaya. SIL International, Dallas. SIL Electronic Survey Reports 2002-078 http://www.sil.org/silesr/abstract.asp?ref=2002-078.

Colson, Audrey Butt. 2011. Fr Cuthbert Cary-Elwes and his Linguistic Legacy. Letters and Notices (British Province of the Society of Jesus) 100(439). 199–203.

Das Gupta, Biman Kumar. 1959. The Pahiras of Khokro, Manbhum District. Bulletin of the Department of Anthropology VIII(2). 85–90.

Das, Tarakchandra. 1931. The Wild Kharias of Dhalbhum (Anthropological Papers, University of Calcutta, New Series 3). Calcutta: Univ. Calcutta.

Deogaonkar, Shashishekhar Gopal. 1986. The Hill-Korwa. New Delhi: Concept.

Doriot, Roger E. 1991. 6-2-3-4 Trek, April-May, 1991. Ms.

Dubey, K. C. 1970. The Dorlas of Bhopalpatnam, Bastar District. Bulletin of the Tribal Research and Training Institute, Bhopal VIII(2). 1–12.

Egland, Steven, Doris Bartholomew & Saúl Cruz Ramos. 1983. La inteligibilidad interdialectal en México: resultados de algunos sondeos. México: ILV.

Fokkema, Dirk. 1969. Beknopte spraakkunst van het Schiermonnikoogs (Fryske Akademy 172). Leeuwarden: Fryske Akademy.

Gebeyehu, Dessalegn. 2013. On the Verge of Dying: Languages in Ethiopia. Ogmios 52. 3–6.

Golla, Victor. 2011. California Indian languages. Berkeley: University of California Press.

Grimes, Charles E. & Kenneth R. Maryott. 1994. Named speech registers in Austronesian languages. In Tom Dutton & Darrell T. Tryon (eds.), Culture change, language change: Case studies from Melanesia (Trends in linguistics: Studies and monographs 77), 275-319. Berlin: Mouton de Gruyter.

Hachipola, Simooya Jerome. 1998. A Survey of the Minority Languages of Zimbabwe. Harare: University of Zimbabwe Publications.

Hammarström, Harald, Robert Forkel & Martin Haspelmath. 2019. Glottolog 4.1. Jena: Max Planck Institute for the Science of Human History. Available at http://glottolog.org. Accessed on 2019-12-01.

Hammarström, Harald, Thom Castermans, Robert Forkel, Kevin Verbeek, Michel A. Westenberg & Bettina Speckmann. 2018. Simultaneous Visualization of Language Endangerment and Language Description. Language Documentation & Conservation 12. 359–392.

Hammarström, Harald. 2010. The Status of the Least Documented Language Families in the World. Language Documentation & Conservation 4. 177–212.

Harmon, David & Jonathan Loh. 2010. The Index of Linguistic Diversity: A New Quantitative Measure of Trends in the Status of the World’s Languages. Language Documentation & Conservation 4. 97–151.

Harrisson, Tom. 1965. Three ”Secret” Communication Systems among Borneo Nomads (and their Dogs). Journal of the Malaysian Branch of the Royal Asiatic Society 38(2). 67–86.

Hasler, Richard. 1996. Agriculture, foraging and wildlife resource use in Africa: cultural and political dynamics in the Zambezi Valley. London: Kegan Paul International.

Hauk, Bryn & Raina Heaton. 2018. Triage: Setting Priorities for Endangered Language Research. In Lyle Campbell & Anna Belew (eds.), Cataloguing the World’s Endangered Languages, 259-304. London: Routledge.

Hoenigman, Darja. 2015. ’The talk goes many ways’: registers of language and modes of performance in Kanjimei, East Sepik Province, Papua New Guinea. Australian National University doctoral dissertation.

Hoogland, J. 1940. Memorie van Overgave van de Onderafdeling Hollandia. Nationaal Archief, Den Haag, Ministerie van Koloniën: Kantoor Bevolkingszaken Nieuw-Guinea te Hollandia: Rapportenarchief, 1950-1962, nummer toegang 2.10.25, inventarisnummer 24.

Kamholz, David. 2012. The Keuw isolate: Preliminary materials and classification. In Harald Hammarström & Wilco van den Heuvel (eds.), History, contact and classification of Papuan languages (LLM Special Issue 2012), 243-268. Port Moresby: Linguistic Society of Papua New Guinea.

Kleinewillinghöfer, Ulrich. 2001. Jalaa- An Almost Forgotten Language of Northeastern Nigeria: A Language Isolate. In Derek Nurse (ed.), Historical Language Contact in Africa (Sprache und Geschichte in Afrika 16/17), 239-271. Cologne: Rüdiger Köppe.

Lionnet, Florian & Remadji Hoïnathy. 2015. Kawa̰wa̰y: an endangered blacksmith language of southern Chad. Paper presented at Fieldwork Forum, UC Berkeley, 9 April 2015.

Luiz, A. A. D. 1962. Tribes of Kerala. New Delhi: Bharatiya Adimjati Sevak Sangh.

Mariati S., Sitti. 2014. Adjektiva bahasa Namla. Kibas Cenderawasih 11(1). 57–68.

McGregor, William. 1989. Gooniyandi Mother-in-Law ”Language”: Dialect, Register, and/or Code?. In Ulrich Ammon (ed.), Status and function of languages and language varieties (Foundations of Communication), 630-656. Berlin: Mouton de Gruyter.

Menon, T. Madhava. 1996. The encyclopaedia of Dravidian tribes Vol. II. Thiruvananthapuram, Kerala: International School of Dravidian Linguistics.

Menon, T. Madhava. 1997. The encyclopaedia of Dravidian tribes Vol. III. Thiruvananthapuram, Kerala: International School of Dravidian Linguistics.

Meyer, Julien. 2015. Whistled Languages: A Worldwide Inquiry on Human Whistled Speech. Berlin: Springer.

Misra, V.N. 1990. The Van Vagris – ”Lost” Hunters of the Thar Desert, Rajasthan. Man and Environment XV(2). 89–108.

Misra, V.N. & Malti Nagar. 1990. The Kanjars- A Hunting-Gathering Community of the Ganga Valley, Uttar Pradesh. Man and Environment XV(2). 71–88.

Misra, V.N. & Malti Nagar. 1993. The Pardhis: A Hunting-Gathering Community of Central and Western India. Man and Environment XVIII(1). 115–144.

Moñino, Yves. 1991. Les langues spéciales sont-elles des langues? La notion de pseudo-langue à travers l’exemple d’une ’langue d’initiation’ d’Afrique centrale. Langage et société 56. 5–20.

Nevin, Bruce Edwin. 1998. Aspects of Pit River phonology. Philadelphia: University of Pennsylvania doctoral dissertation.

Nicolle, W. H. H. 1959. Tribes of the Zambesi valley. NADA: Southern Rhodesia Native Affairs Dept. Annual 36. 11–15.

Noss, Philip A. 1977. Compounding in To: the dynamics of a closed pidgin. In Martin Mould & Thomas Joseph Hinnebusch (eds.), Proceedings of the 8th conference on African linguistics (Studies in African linguistics: Supplement 7), 185-197. Los Angeles: African Studies Center & Dept. of Linguistics, University of California at Los Angeles (UCLA).

Osada, Toshiki. 1993. Field notes on Birhor. In Tsuyoshi Nara (ed.), A computer-assisted study of South-Asian languages, 30-41. Tokyo: ILCAA.

Parkvall, Mikael & Peter Bakker. 2013a. Pidgins. In Mark Aronoff (ed.), Oxford Bibliographies in Linguistics, 1-42. New York: Oxford University Press.

Parkvall, Mikael & Peter Bakker. 2013b. Pidgins. In Yaron Matras & Peter Bakker (eds.), Contact languages: A Comprehensive Guide, 15-64. Berlin: Mouton de Gruyter.

Rao, Aparna. 1995. Marginality and Language Use: The Example of Peripatetics in Afghanistan. Journal of the Gypsy Lore Society: 5th Series 5. 69–95.

Rizvi, Baqar Raza. 1977. Economic organization of the Hill Korwa of Chattisgarh (M.P.). University of Nagpur doctoral dissertation.

Roy, Sarat Chandra & Ramesa Chandra Roy. 1937. The Khāṛiās. Ranchi:”Man in India”. 2 vols.

Roy, Sarat Chandra. 1925. The Birhors: a little-known jungle tribe of Chota Nagpur. Ranchi: K.E.M. Mission Press.

Sanada, Shinji & Yuehchen Chien. 2012. Japanese-lexicon Creole in Taiwan. NINJAL project review 3(1). 38–48.

Sands, Bonny. 2017. The Challenge of Documenting Africa’s Least Known Languages. In Jason Kandybowicz & Harold Torrence (eds.), Africa’s Endangered Languages: Documentary and Theoretical Approaches, 11-38. Oxford: Oxford University Press.

dos Santos Júnior, Joaquim Norberto. 1944. Algumas Tribos do Distrito de Tete. Porto: Republica Portuguesa.

Seifart, Frank, Nicholas Evans, Harald Hammarström & Stephen C. Levinson. 2018. Language documentation twenty-five years on. Language 94(4e). 324–345.

Sinha, N. P. 1972. The Birhors. In M. G. Bicchieri (ed.), Hunters and Gatherers Today, 373-403. New York: Holt, Rinehart & Winston.

Smith, Edwin W. 1938. A tentative grammar of the Efe or Mbuti language: The reputed language of the Pygmies of the Ituri Forest, Belgian Congo. London: The Bible House.

Sun, Hongkai. 1999. On the Himalayan languages of the eastern Himalayan area in China. Linguistics of the Tibeto-Burman Area 22(2). 61–72.

Tamayi, . 1959. A visit to the Vadoma massif. NADA: Southern Rhodesia Native Affairs Dept. Annual 36. 52–57.

Terashima, H. 1980. Hunting Life of the Bambote: An anthropological study of Hunter-Gatherers in a Wooded Savanna. Senri Ethnological Studies 6. 223–268.

Tramutoli, Rosanna. 2012. Giing’aweakshooda: A register of respect among Barabaig speakers of Tanzania. Rijksuniversiteit te Leiden MA thesis.

Tupper, Ian. 2007. Endangered Languages Listing: TURUMSA [tqm]. Document posted at http://www.pnglanguages.org/pacific/png/show_lang_entry.asp?id=tqm accessed 1 May 2007.

Urua, Eno-Abasi. 2008. Medefidrin, the ’Spirit’ language of the 1920s in Ibibio land. In Ozo-Mekuri Ndimele, Imeld I. I. Udoh & Ogbonna Anyanwu (eds.), Critical Issues in the Study of Linguistics, Languages and Literatures in Nigeria: Festschrift for Conrad Max Benedict Brann (The Nigerian Linguists Festschrift Series 7), 493-506. Port Harcourt: M&J Grand Orbit Communication and Emhai Press.

de Vries, Lourens J. 2004. A short grammar of Inanwatan: an endangered language of the Bird’s head of Papua, Indonesia (Pacific Linguistics 560). Canberra: Research School of Pacific and Asian Studies, Australian National University.

de Vries, Lourens. 2002. An Introduction to the Inanwatan language of Irian Jaya. In Alexander K. Adelaar & Robert Blust (eds.), Between Worlds: Linguistic Papers in memory of David John Prentice (Pacific Linguistics 529), 77-94. Canberra: Research School of Pacific and Asian Studies, Australian National University.

Windfuhr, Gernot L. 2002. Gypsy Dialects. In Encyclopædia Iranica volume XI:4, 415-421. Costa Mesa, California.

Wolf, Katharina. 2010. Une enquête sociolinguistique parmi les Amdang (Mimi) du Tchad: Rapport Technique. SIL Electronic Survey Reports 2010-028. 1–96.

Yu, Dazhong. 2017. Jìndài xiāng qián guì biānqū de zúqún hùdòng hé “sānqiāo rén” de xíngchéng. Journal of Guizhou Education University 33(1). 2–9.

Zwernemann, Jürgen. 1996. Documents kpḛḛgo. Cahiers voltaïques / Gur papers 1. 147–164.

Share this Post:

1 Comment

  1. Another dataset that could be of interest in this context is the contents of OLAC archives. We have built a system to compile the OLAC data into scores per language, not dissimilar from the ones listed above, and displayed in a visualiser here: https://language-archives.services/olacvis/#/

Leave a Comment

Your email address will not be published. Required fields are marked *