Reasons you should use video in language documentation

At CoLang this year I was invited to come and talk with the group in the Recording and using video in language documentation class. I shared some of my favourite reasons why I always try to use video in a language documentation project, which gave me a chance to mention some of my favourite research on gesture, and talk to people about their experiences with filming. I thought I’d write up four of my favourite reasons for filming video in this post. If you’re thinking of doing a language documentation project I’ve also written a paragraph at the end of this that you can use in the first draft of a grant application.

Gesture is an important part of communication

Gesture and speech work together. It’s often much easier to understand the size or shape of an object if someone is gesturing while talking about it. You also don’t want to spend hours listening to people saying ‘when you weave this bit goes around that bit and then these are connected’. You know that those gestures are illustrating the point being made, but without seeing them you’re loosing all the important information.

Gesture is an important part of cognition

Psycholinguists will tell you that gesture and speech are deeply integrated in your brain. We know this because sometimes the speech and the gesture refer to the same thing, or reflect different perspectives on the one topic. Other times, gestures will give us an insight into someone’s thoughts even though there is no linguistic evidence for what is happening. Next time you watch an English speaker talking about things coming up in the next few days, look at what they are doing with their hands. If they’re gesturing, It’s likely they are ordering those events with the soonest on their left and the later events on their right. That’s because English speakers tend to order events from left to right, which is a reflection of our writing system. Even though there’s no spoken evidence for this cognitive habit, there is gestural evidence. Other languages may have other metaphors for how they order time or events, which might influence the gestures that they use. Aymara (South America) speakers, for example, gesture with the future behind them.

Gesture is an important part of culture

All humans gesture, but different cultures gesture differently. I’ve written about the nose-tap gesture, which is common to the UK, Italy and France. Similarly,  recognising the ‘up yours’ gesture as offensive depends on whether you’re from the USA or the UK. It’s not just these symbolic gestures that are culturally acquired. The shape of your hand when you point at things, varies across cultures. Some cultures don’t point with the left hand, and others don’t even point with the hand at all; Nick Enfield showed for Lao that pointing with the lips is a common strategy.


It’s not that we can’t point with our lips – maybe you do when your hands are full – but it’s not common.

People like to look at things

As a selfish reason to collect video, it makes transcription much easier, because you have additional visual cues, and all that additional content (see point one). Video also contains a lot of incidental information about how people dress, and what their daily environment is like. It also means when it comes time to share materials with participants and community organisations, you can share videos, which are far more interesting than just audio files. I had always thought this was good, but I got confirmation on my most recent visit to Nepal. On the day we were recording with Norpu, the village Shaman, he told us he was so pleased we were recording people and making a visual record. He regrets that he does not have a single photograph of his mother, who died 20 years ago.

Let me preempt some problems with video

All of this presumes that you’re working in a community where people are ok with digital representations of their images and voices. It also presumes that you’re working in genres that are appropriate to film, and have met basic IRB/ethics requirements. I also presume you’ve discussed sharing and permissions with the community, and the individuals you are recording with. This may restrict some of the genres or topics that can be recorded with video, or different videos may have different ‘access permissions’ (e.g. some videos may be open to any audience, while only the community members and researchers may be able to access others). I know some people who say that if you’re not given the right to film video then a project is not worth the time. I don’t entirely agree with that, but it will be a diminished set of outputs with only audio.

Some people don’t like to work with video because it takes more effort to set up than just an audio mic. That’s true – but an audio mic takes more effort to set up than just sitting at home, and when you’ve already driven through 8 hours of desert, or flown to another country, it’s not *that* much more effort. Other people find video too obtrusive. My feeling is that setting up any recording situation is obtrusive (provided it meets ethical requirements and you’ve discussed it with participants). I find that being comfortable with your equipment and making people feel comfortable with your presence mitigates many of those problems. Practice setting up as many times as you can before you begin the project. Record your friends and family. I now know my gear well enough now to continue chatting throughout the setup. I’ve also had a lot of luck training a younger member of the Syuba community to help me with these sessions, which puts people at ease (particularly me).

Some people will worry that video takes up too much storage space. Make sure you test how much space that video takes up, and budget for a situation where you record even more than you expect, as people can get enthusiastic once you’re on a roll. Talking to archives early in the project planning to establish what they can take will also help you avoid problems down the line.

Here’s a project paragraph for you

This project uses both video and audio recording. This is to ensure that the data is the most useful it can be in the long term for both linguistic analysis and community sharing. Having video as well as audio makes transcription easier, and ensures that the elements of discourse that are not in the spoken channel are still collected. Both the audio and video equipment record in high-quality lossless formats suitable for archiving. I have budgeted for archiving as quoted by <insert archive name> and ensured that I have sufficient local storage for adequate backup.

By Lauren Gawne

This content originally appeared on Superlinguo at

ELDP Project Highlight: Documentation of Northern Alta, a Philippine Negrito Language

This week on the ELAR blog, Alexandro Garcia-Laguia shares a look into his ELDP project. Alexandro is researching Northern Alta, an endangered language spoken along the rivers of Aurora province in the Philippines.

Reconstructing an old Alta song:

The speakers of Alta have reported that their parents did not teach them any songs in Alta (n_alta054.42). However, one day, at a gathering with six women in Barangay Dianed, the ladies recalled fragments of an Alta song. They decided to sit down and collaborate to write and complete the lyrics. We recorded them singing the song twice (and the recordings of the song, the transcription and other relevant files have been uploaded to the Endangered Languages Archive as session 45:

Subsequently Joaquin Ramón, a composer from Spain, created a backing track for the song with the piano, so the Alta can sing the song whenever they want and teach it to the children. Karaoke is appreciated in the communities and in the Philippines in general, and is often used as way of having fun on weekends, so we expect the recording to be used in the future.

The recording of this backing track is included in the session 45 file (nalta45_piano) and has also been uploaded to the cloud and:


Writing the lyrics of the Alta song (Dianed, January 2014)


Recording the song

The non-Alta speakers of Alta

Given the small number of speakers of Alta – estimations go from 200 to 300 persons – those who are not Alta but speak the language are rare, but do exist. The corpus includes a number of recordings of four different speakers of the language who are not ethnically Alta. Some of them have a surprising command of the language. This is the case of Inelda Andon, who states “I am not an Alta, he is the Alta here, but I learned the language when I was a kid. When I was four we started living with the Alta, thus, even if we do not have curly hair, even if we are not Alta, we can speak the Alta language” (session 60).

During a series of transcription sessions, native speaker Violeta Fernandez, who was slowly repeating the recordings we had made of the language, would confidently point and substitute Tagalog borrowings with the native Alta word. Surprisingly, whenever she could not remember the Alta word, she would ask Inelda, who was in the garden but could listen to what we were transcribing. Several times, Inelda Andon provided the corresponding Alta word.


Inelda and her husband Antonio Andon at Diteki (February 2015)

Other non-native speakers have learned the language, either because they grew up with Alta neighbors, or because they are married to an Alta. In recording session 40 (How I learned Alta) Rogelio Ganarrial, who is the second husband of the barangay chieftain and native Alta speaker Erlinda Ganarrial, describes his experiences with the language. In two other recordings (41 and 42), Mila Lasam explains how she learned the language and how her daily life is at the coastal barangay Dianed. Finally, Conchita Genes, originally from Dibut (an isolated coastal area where Umiray Dumaget Agta, another Negrito language, is spoken), says she left her village when she was a child and does not remember anything of it. Conchita grew up in Diteki with the Alta and is now married to Renato Genes, a native Alta with whom she speaks the language on a daily basis. She has participated actively in the project (see recordings 81, 88, 90 and 93).

Given the circumstances in which the Alta are sometimes mocked because of their curly hair or the way their language sounds, the non­-Alta speakers of the language are an example of tolerance for the community.

Thank you, Alexandro!

See Alexandro’s deposit here:

Open Access and Open Data in Language Research and Documentation: Opportunities and Challenges

SOAS World Languages Institute collaborated with University and City Library of Cologne and the Cologne Center for e-Humanities to hold the international workshop: Open Access and Open Data in Language Research and Documentation: Opportunities and Challenges in Cologne, Germany from October 10-12, 2016.

Open Access (unrestricted online access to publicly funded peer-reviewed publications), has become a major movement in the academic world in the past decade. While Open Access to publications is generally supported, the call for access to the primary materials- the data on which the publications are based- is contested and sometimes hotly debated, especially in the documentation of endangered languages. In the endangered language context, issues such as privacy and ownership are of great concern. At the same time, best practice in scientific conduct and sharing of data are basic academic principles. Open Access promotes a higher caliber of data quality, by allowing data to be verified. Additionally, it helps ensure that funders are not repeatedly funding the same projects.

Thirty-five professionals from different geographical areas attended this 2.5 day conference, including lawyers, researchers, archivists, junior and senior fieldworkers, community members, and representatives from WIPO in Geneva, UNESCO, and the Cologne Centre for eHumanities. These participants discussed the key issues in evaluating challenges and opportunities and to provide practical solutions to Open Access/Open Data of primary documentation materials, particularly the question of how to protect moral rights while supporting good academic research.

This conference was funded by the Volkswagen Stiftung. UNESCO’s Universal Access and Preservation Section, Information Society Division, Communication and Information Sector partnered to support policy development in the sector and to provide practical teaching materials.

The participants are now working together to publish the results from the workshop.


Helpful Tips for New ELAN Users

This week on the ELAR blog, Sarah Dopierala (MA Language Documentation and Description, SOAS) gives linguists who are new to ELAN five quick tips for using the software.

It is a fact well-observed that some of us are more tech savvy and some of us – not so much. For a documentary linguist trying to make the most of software like ELAN, ignorance perhaps isn’t bliss. For those of you Not-So- Much-ers out there, here are 5 tips for using ELAN from a fellow Computationally-Impaired Linguist.


When you save something in ELAN, you’ve perhaps noticed that there are actually two files being created:



In order to open a project in ELAN correctly, it is important that these two files (.eaf and .pfsx) are saved in the same folder (along with the original recording). If you move the .eaf, .pfsx files and recordings into separate folders, ELAN won’t be able to find them.


ELAN is a powerful software – it can do many things and there are many different options to click on. There are two modes that are particularly useful for me when I am transcribing an audio recording:

Annotation Mode and Segmentation Mode.


These can be found under ‘options’.

In Segmentation mode, you can isolate instances of speech in your sound file. For me, this means capturing the speech of my consultant and not my own speech. You can see in the picture below that there are sections of the recording that are bracketed off by black lines. Segmentation mode is the mode where you create those black lines.


Annotation mode is the mode where you can transcribe the segments created in Segmentation mode. In Annotation Mode, each bracket of speech has a number (1, 2, 3…) which appears as its own line with the begin/end time and the duration. If you want to transcribe your segment, click on the space between the number and the begin/end information in the section titled: Annotation. You can see this next to number 1, below:


I find that it is best to first segment all the speech you want in Segmentation mode, and then add a transcription in Annotation mode.


Just in general, but especially if you intend on exporting your ELAN file into FLEx, I’ve found that a way to avoid problems between the two softwares is to avoid using punctuation in transcriptions. If you insert a comma, for example, the utterance with the comma will be split up in FLEx (that is, it will appear in two separate lines). Which makes it hard to translate the entire utterance (since the pieces are separated). This seems to be the case with other things like periods, question marks, etc.


Once you actually have segments in your audio file, you may want to play and listen to them more than once to check the accuracy of your transcription. There are several ways to do this. The way I prefer is to highlight the specific segment I want by clicking on the black bracket lines and then clicking the grey arrow with an ‘S’ in the center middle of the ELAN screen.


This way, only the segment as it is defined by the boundaries is played back.


Perhaps the most important thing to remember – if you haven’t already found out the hard way – is that ELAN (unlike FLEx) does NOT save changes/files automatically. Transcription is a long and laborious process. Make sure to save your work!

Have these tips been helpful for you? Do you have some more tips that ELAN users could benefit from? Please help a linguist out in the comments section below.

By Sarah Dopierala