Elan/Praat Machine Segmenting

Today on the ELAR blog, Eri Kashima shares her tried and true segmenting shortcuts for depositors working with ELAN and Praat. Read on for a step-by-step walkthrough of her process: 

My number one hated stage in transcription work is segmenting. I would sit there fuming while manually segmenting the recordings I made before I could even start transcribing. It was frustrating because it seemed like something that a machine could to a relatively good approximation of instead of me sitting there for hours doing it for each file!

Luckily, it turns out that between Praat and ELAN, you can very easily have a decent approximation of segmentation done for you.  Not perfect, but it saves HEAPS of time. If you have a ton of recordings to segment into units before you need to transcribe, this is the process for you!

Thank you to T. Mark Ellison for helping out with this heaps.

Praat stage

First load the sound file that you want to segment into Praat (Open > Read from File). Create a Praat Textgrid file based on silences.:


This next part is our best setting after a few trials:


The resulting text grid should look something like this:

Screen Shot 2017-05-01 at 11.14.15 am.png

The *** is where Praat has segmented for sound. It’s not perfect, but it gives a pretty good shot at things, and you can adjust the boundaries manually in Elan. Save this text grid now.

ELAN stage

Import your Praat text grid:


Cheers Hedvig for letting me know that if you tick the “exclude silences” box, you can have ELAN automatically remove any empty segments from the Praat text file:

Screen Shot 2017-05-05 at 4.15.56 pm.pngAnd you will have your segmented Praat text grid as a layer in Elan looking something like this!

Screen Shot 2017-05-01 at 11.26.42 am.png

The longest file we tried it on was a 1 hour recording of Samoan (cheers Hedvig Skirgård from Humans Who Read Grammars for providing the file!). It took about 8 minutes for Praat to segment. A 10 minute recording is done in no time.

Now be on your merry way setting up your tiers and transcribing to your hearts content 🙂

 Thank you, Eri! This content originally appeared on Eri’s linguistics blog, Yammering On

FLEx Tips for New Language Documenters

FieldWorks Language Explorer (FLEx) is a program that has built upon previous software designed for documentary linguists (perhaps some of you remember ToolBox)? As a result, FLEx is a very useful and powerful lexicon building tool. For those of you who have found our beloved FLEx but need a few hints to get you started, hopefully these tips will be helpful!


If your texts are being imported from other software (such as ELAN), a new text will automatically be generated in the Texts and Words section of FLEx.

You cannot create a new text, and then try an import an outside file into that newly created text. Instead, you’ll end up with two newly created lists, one with text from the imported file, and one empty.


If you’ve imported or created texts, but then are frustratingly prevented from annotating them, it may be because you haven’t changed the language of the baseline text. If the language of the baseline text is the same as the language of annotation, you won’t be able to analyze the text. (This will happen if you are annotating in English and your Baseline language is also English, for example). To change this, go to the baseline and highlight the whole text. Then, in the upper middle area of the tool bar, choose the new baseline language from the dropdown menu.


This one I found out the hard way. Yes, there are significant differences between a new definition and a new sense of a word – at least in terms of how FLEx interprets them in the future. A new ‘sense’ of a word will be displayed in the same lexical entry while a new ‘definition’ will appear as its own entry in the lexicon. This is useful for distinguishing between polysemy and homophony, for instance. It seems trivial but it’s important to get it right the first time as it is difficult and time-consuming to alter later.


Something I also didn’t realize at first is that you can add notes to your sentences in the Analzye tab. Do this by either CONTROL + N or by right-hand clicking the Insert Note button at the top right of the tool bar. Notes can be useful to remember important things and if you have certain ways you want to tag the sentences, you can insert the tag in the note tier and search only the note tier with the search engine in Concordance later.


There are several ways to search for things in FLEx. The one I find most useful is searching in Concordance which you can access in Words & Texts. This is helpful if you need to find multiple instances of the same word or morpheme for instance.

As you can see from the picture, above, you have the ability to search different tiers with several methods (we will ignore ‘use regular expressions’ for now). For example, you can search via the ‘whole word’, ‘at end’/’at start’, and ‘anywhere’. Searching for the ‘whole word’ means that there is a space before and after the string of characters in the text. Searching ‘at end’/’at start’ searches for characters either preceded or followed by other characters (which can be useful when searching for pre- or suffixes). Finally, ‘anywhere’ means that the search engine will search for the string of characters anywhere (but be careful, a particular string might not always manifest as one word or phrase. The engine will not be searching for one ‘word’- it will indiscriminately search for that string of characters across boundaries).


Finally, just in case you’ve been desperately searching for a way to save your work in FLEx, fret not! Perhaps one of the most convenient things about FLEx is that your work is automatically saved. It is always a good idea, however, to have your work saved in multiple places. If the only place your FLEx database is recorded is your own computer, consider using a website such as LanguageDepot to save and share your corpus with colleagues or an external hard drive – just to make sure your data are saved multiple places.

What other tips do you have for our FLEx users? Please share them in the comments section below and help spread the knowledge!

By Sarah Dopierala

Reasons you should use video in language documentation

At CoLang this year I was invited to come and talk with the group in the Recording and using video in language documentation class. I shared some of my favourite reasons why I always try to use video in a language documentation project, which gave me a chance to mention some of my favourite research on gesture, and talk to people about their experiences with filming. I thought I’d write up four of my favourite reasons for filming video in this post. If you’re thinking of doing a language documentation project I’ve also written a paragraph at the end of this that you can use in the first draft of a grant application.

Gesture is an important part of communication

Gesture and speech work together. It’s often much easier to understand the size or shape of an object if someone is gesturing while talking about it. You also don’t want to spend hours listening to people saying ‘when you weave this bit goes around that bit and then these are connected’. You know that those gestures are illustrating the point being made, but without seeing them you’re loosing all the important information.

Gesture is an important part of cognition

Psycholinguists will tell you that gesture and speech are deeply integrated in your brain. We know this because sometimes the speech and the gesture refer to the same thing, or reflect different perspectives on the one topic. Other times, gestures will give us an insight into someone’s thoughts even though there is no linguistic evidence for what is happening. Next time you watch an English speaker talking about things coming up in the next few days, look at what they are doing with their hands. If they’re gesturing, It’s likely they are ordering those events with the soonest on their left and the later events on their right. That’s because English speakers tend to order events from left to right, which is a reflection of our writing system. Even though there’s no spoken evidence for this cognitive habit, there is gestural evidence. Other languages may have other metaphors for how they order time or events, which might influence the gestures that they use. Aymara (South America) speakers, for example, gesture with the future behind them.

Gesture is an important part of culture

All humans gesture, but different cultures gesture differently. I’ve written about the nose-tap gesture, which is common to the UK, Italy and France. Similarly,  recognising the ‘up yours’ gesture as offensive depends on whether you’re from the USA or the UK. It’s not just these symbolic gestures that are culturally acquired. The shape of your hand when you point at things, varies across cultures. Some cultures don’t point with the left hand, and others don’t even point with the hand at all; Nick Enfield showed for Lao that pointing with the lips is a common strategy.


It’s not that we can’t point with our lips – maybe you do when your hands are full – but it’s not common.

People like to look at things

As a selfish reason to collect video, it makes transcription much easier, because you have additional visual cues, and all that additional content (see point one). Video also contains a lot of incidental information about how people dress, and what their daily environment is like. It also means when it comes time to share materials with participants and community organisations, you can share videos, which are far more interesting than just audio files. I had always thought this was good, but I got confirmation on my most recent visit to Nepal. On the day we were recording with Norpu, the village Shaman, he told us he was so pleased we were recording people and making a visual record. He regrets that he does not have a single photograph of his mother, who died 20 years ago.

Let me preempt some problems with video

All of this presumes that you’re working in a community where people are ok with digital representations of their images and voices. It also presumes that you’re working in genres that are appropriate to film, and have met basic IRB/ethics requirements. I also presume you’ve discussed sharing and permissions with the community, and the individuals you are recording with. This may restrict some of the genres or topics that can be recorded with video, or different videos may have different ‘access permissions’ (e.g. some videos may be open to any audience, while only the community members and researchers may be able to access others). I know some people who say that if you’re not given the right to film video then a project is not worth the time. I don’t entirely agree with that, but it will be a diminished set of outputs with only audio.

Some people don’t like to work with video because it takes more effort to set up than just an audio mic. That’s true – but an audio mic takes more effort to set up than just sitting at home, and when you’ve already driven through 8 hours of desert, or flown to another country, it’s not *that* much more effort. Other people find video too obtrusive. My feeling is that setting up any recording situation is obtrusive (provided it meets ethical requirements and you’ve discussed it with participants). I find that being comfortable with your equipment and making people feel comfortable with your presence mitigates many of those problems. Practice setting up as many times as you can before you begin the project. Record your friends and family. I now know my gear well enough now to continue chatting throughout the setup. I’ve also had a lot of luck training a younger member of the Syuba community to help me with these sessions, which puts people at ease (particularly me).

Some people will worry that video takes up too much storage space. Make sure you test how much space that video takes up, and budget for a situation where you record even more than you expect, as people can get enthusiastic once you’re on a roll. Talking to archives early in the project planning to establish what they can take will also help you avoid problems down the line.

Here’s a project paragraph for you

This project uses both video and audio recording. This is to ensure that the data is the most useful it can be in the long term for both linguistic analysis and community sharing. Having video as well as audio makes transcription easier, and ensures that the elements of discourse that are not in the spoken channel are still collected. Both the audio and video equipment record in high-quality lossless formats suitable for archiving. I have budgeted for archiving as quoted by <insert archive name> and ensured that I have sufficient local storage for adequate backup.

By Lauren Gawne

This content originally appeared on Superlinguo at http://www.superlinguo.com/post/148949834781/reasons-you-should-use-video-in-language

Helpful Tips for New ELAN Users

This week on the ELAR blog, Sarah Dopierala (MA Language Documentation and Description, SOAS) gives linguists who are new to ELAN five quick tips for using the software.

It is a fact well-observed that some of us are more tech savvy and some of us – not so much. For a documentary linguist trying to make the most of software like ELAN, ignorance perhaps isn’t bliss. For those of you Not-So- Much-ers out there, here are 5 tips for using ELAN from a fellow Computationally-Impaired Linguist.


When you save something in ELAN, you’ve perhaps noticed that there are actually two files being created:



In order to open a project in ELAN correctly, it is important that these two files (.eaf and .pfsx) are saved in the same folder (along with the original recording). If you move the .eaf, .pfsx files and recordings into separate folders, ELAN won’t be able to find them.


ELAN is a powerful software – it can do many things and there are many different options to click on. There are two modes that are particularly useful for me when I am transcribing an audio recording:

Annotation Mode and Segmentation Mode.


These can be found under ‘options’.

In Segmentation mode, you can isolate instances of speech in your sound file. For me, this means capturing the speech of my consultant and not my own speech. You can see in the picture below that there are sections of the recording that are bracketed off by black lines. Segmentation mode is the mode where you create those black lines.


Annotation mode is the mode where you can transcribe the segments created in Segmentation mode. In Annotation Mode, each bracket of speech has a number (1, 2, 3…) which appears as its own line with the begin/end time and the duration. If you want to transcribe your segment, click on the space between the number and the begin/end information in the section titled: Annotation. You can see this next to number 1, below:


I find that it is best to first segment all the speech you want in Segmentation mode, and then add a transcription in Annotation mode.


Just in general, but especially if you intend on exporting your ELAN file into FLEx, I’ve found that a way to avoid problems between the two softwares is to avoid using punctuation in transcriptions. If you insert a comma, for example, the utterance with the comma will be split up in FLEx (that is, it will appear in two separate lines). Which makes it hard to translate the entire utterance (since the pieces are separated). This seems to be the case with other things like periods, question marks, etc.


Once you actually have segments in your audio file, you may want to play and listen to them more than once to check the accuracy of your transcription. There are several ways to do this. The way I prefer is to highlight the specific segment I want by clicking on the black bracket lines and then clicking the grey arrow with an ‘S’ in the center middle of the ELAN screen.


This way, only the segment as it is defined by the boundaries is played back.


Perhaps the most important thing to remember – if you haven’t already found out the hard way – is that ELAN (unlike FLEx) does NOT save changes/files automatically. Transcription is a long and laborious process. Make sure to save your work!

Have these tips been helpful for you? Do you have some more tips that ELAN users could benefit from? Please help a linguist out in the comments section below.

By Sarah Dopierala

New Catalogue

ELAR is delighted to announce the launch of the new ELAR catalogue. On the 28th of September 2016, the new catalogue replaced the old version and can now be accessed via the URL: http://elar.soas.ac.uk/

New features include archive-wide statistics which display not only the number of downloaded files according to their access categories, but also the number of uploaded files within a select time range. These statistics can be seen in the bottom right corner of the display (see below image enlarged at: new_catalogue )

new catalogue public statistics

In addition to a deposit page that summarises the main characteristics of each collection and illustrates the deposit with photos, podcasts and show reels, each collection has its own public statistics. These public statistics, among other things, give information on the number and type of downloaded files. Each collection also contains private statistics for depositor use (see below example deposit page at: deposit_example )


Navigating through the collections, ELAR users will have the opportunity to listen to voices from all over the world, to watch photos and videos that show how endangered languages are being used in their communities, and to download transcriptions, lexical databases and several other files collected during documentation projects.

For depositors to find their deposit page, they should use the search bar to search for their name, their deposit key, or their project’s language. This will pull up the deposit bundles and deposit page link.

To upload data, depositors will still need to use LAMUS in order to self-upload their project’s data.

In the spring of 2014, the ELAR and the ELDP teams began the migration of the data onto the new system with the support of the SOAS library and information services. Together with the Language Archive of the Max Planck Institute for Psycholinguistics in Nijmegen and the London team, the new system was set up. The new system features self-upload facilities, which enable depositors to load and manage their collection themselves. The VuFind discovery layer maintains the familiar interface and faceted browsing.

Thanks to many people involved in the migration – most notably the digital archivists of ELAR, Sophie Salffner and Vera Ferreira, the whole content of ELAR is saved in a more structured and standardised way.

For more information on how to use the ELAR catalogue and how to register your own account, consult the advice for new users on the ELAR website.

By Jonas Lau