ELAN: making tier(s) out of search results
Here is another guide for how to do something practical in ELAN. Previously, we relayed Eri Kashima’s guide for sensible auto-segmentation with PRAAT and ELAN (time saver!). (For all posts about fieldwork on this blog, see this tag.)
This time: how to take your search results and make the matching annotations into new separate tier(s). This is useful if you for example want to cycle through only the annotations that match a certain search query in transcription mode. This post has a longer guide, and a short guide at the end.
You can also use this guide if you want to compare several different transcriptions with each other, for example older and newer versions or if you are collaborating with different people. In that case, start from step (4).
For those who don’t do a lot of transcription: ELAN (EUDICO Linguistic Annotator) is a program from TLA at MPI-Nijmegen. This program allows us to easily annotate audio and/or video files with lots of relevant data. We can use ELAN to count things, but we can also export as CSV-files for analysis later (Excel, R, Libreoffice etc). ELAN is free and great. If you ever need to do transcription, do it in ELAN. Do not create long text-documents with no linking to the audio, it is just ridiculous. Download ELAN here.
Version of ELAN: 4.8.1 (to my knowledge though this should work the same for other versions)
We’re going to:
- search in a clever way
- export those results
- import them as new tier(s) into the .eaf-file you’re working on
- thus creating a tier with a defined subset of other existing tiers, making work speedier on targeted parts of your corpus
Example case
I’ve got a transcribed file where I’ve noticed some different pronunciation of a certain word. I’d like to pick out only the annotations containing that word, make a new tier with only them, and write down some clever things about this word in that tier. I don’t want to have to scroll through all annotations to get to only these.
I work on Samoan, and the word I’m looking at means “to tell/explain”: fa’amatala. “Fa’amatala” is the dictionary entry for this word, but it varies in pronunciation in actual speech. I’ve asked my transcription assistant to mark down vowel length and presence and absence of glottal stops (as opposed to more orthographic transcription). She has done this pretty consistently (as far as I can tell, it’s hard to hear glottal stops sometimes), and since I know what kind of variations to expect I can easily find the instances for this word. Due to t and k-style (lects in Samoan) and speed these are the variations we can expect:
- fa’amatala
- fa:matala
- famatala
- fa’amakala
- fa:makala
- famakala
Besides the obvious difference in pronunciation, I’ve noticed something unusual going on in the realisation of the realisation of t/k, sort of like an affricate. So, I’d like to listen to all instances of this word with all these spellings and make notes of that.
Here are the steps. At the end is a short guide for when you’ve started to get the hang of this but need basic guidance.
Search query results |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
That looks good! Not all variations we thought might exist occurred (we didn’t get “famatala”), but that’s normal. (In fact, specifically not getting that form is expected. Shortening of vowel + the t-lect should not co-occur often, if we believe what Mayer, Ochs and others have said about Samoan variation.)
If you want to edit your search query, you don’t need to start all over. Just click the search window again right there over your results, it’ll be editable again. (This took me a while to realize.)
Export search query results |
Exporting search results dialogue window |
Importing CSV/Tab-delimited Text file |
Import CSV/Tab-delimited Text file dialogue window. |
An actual ghost |
A lonely ghost tier in an otherwise empty .eaf-file |
Chris O’Dowd as Roy Trenneman in IT-crowd |
- a) your original .eaf-file with audio and lotsa tiers
- b) your .eaf-file with only the search results-tier and no audio etc (ghost-tier)
- c) a new merged file consisting of the two above listed
Select Merge transcriptions |
Specifying what should be merged and how |
Final merged file in annotation mode, with the search results tier renamed and copied. |
Final merged file in transcription mode, showing only the search results tiers. |
- You might want to rename file (c) and delete file (a) and (b), for your own sanity later when managing the files, if for nothing else
- Don’t know how to get to transcription mode? Go to “Options>Transcription Mode”.
- Your tiers aren’t showing up properly in transcription mode? Check that the “linguistic types” of the tiers are what you think they are and that that’s what you’ve configured to see in transcription mode. Transcription mode can only show you tiers of one linguistic type at once (unless columns but that complex). I also don’t get it really, but then again I barely get “linguistic types” at all though
- Transcription mode getting clogged up with lots of irrelevant tiers? Got o “Configure…” left in the transcription mode window, select the right linguistic type and “Select tiers..” in the bottom left. Tick only the tiers you want to see at that moment
- You can import several tiers at once by this method, you don’t have to merge one search result at a time, see below
- You might want to do something complicated related to speakers, see below
On a related note, if someone ever was to ask me to do separate speakers in different tiers, I can use the above process to separate out only annotations with a certain value in the speaker-tier and then import them back as tiers per speaker. I’d rather not, I like it this way. But, I like making sure that the way I set things up is possible to configure to please others as well. Flexibility is good, don’t lock yourself into a too narrow set-up that doesn’t allow you to change without losing data.
That granted, I need to do manual fidgety things for overlapping speech given this model. That’s inconvenient, but I’m ok with it.
Short guide
Step 1) Clever searching
- Query>Export (Save as tab-delimited text file)
- File>Import> CSV/Tab-delimited Text file
- Specify columns (1 col: ignore, 2 col: Tier, 3 col: Begin time, 4 col: ignore, 5 col: end time, 6 col: ignore, 7 col: Duration, 8 col: ignore , 9 col: Annotation)
- Save new .eaf-file.
- Quit and restart ELAN
- Open original file with audio and other tiers
- File>Merge transcriptions…
- Select .eaf-file with search results as second source (do not append)
- Save new merged file
- Delete superfluous files
- rename and copy tiers if necessary
I’m sure there’s other ways of doing this, but this is what has worked well for me. I’d like this to be easier in ELAN, but in the meantime this works so I’m gonna do it like this.
I find, in general, that I learn more about ELAN and other similar tools by just trying lots of different things and probing the system. Sure, there’s manuals, but they often envisage a different usage than I’m after. For example, I’m not clear on what I actually gain by “linguistic types” in what I want to do. Nevermind, probing, searching and sharing seem to be the best way to go for tailored functions. Usually, what you can conceptually imagine as a useful thing exists somewhere (it’s like rule 34 but for software). I didn’t know how this worked until I thought to myself: “there must be a way of importing search results”. And lo and behold, there is. Now here’s something I’ve learned and that you now can do too! Good luck!
Richard Ayoade as Maurice Moss in IT-crowd |
Ulrike Mosel and Hedvig Skirgård (yours truly) in Canberra |
Samoan water, Neiafu-Tai village |
References
-
- Sloetjes, H., & Wittenburg, P. (2008).
Annotation by category – ELAN and ISO DCR.
In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). - Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H. (2006).
ELAN: a Professional Framework for Multimodality Research.
In: Proceedings of LREC 2006, Fifth International Conference on Language Resources and Evaluation. - Brugman, H., Russel, A. (2004).
Annotating Multimedia/ Multi-modal resources with ELAN.
In: Proceedings of LREC 2004, Fourth International Conference on Language Resources and Evaluation. - Crasborn, O., Sloetjes, H. (2008).
Enhanced ELAN functionality for sign language corpora.
In: Proceedings of LREC 2008, Sixth International Conference on Language Resources and Evaluation. - Lausberg, H., & Sloetjes, H. (2009).
Coding gestural behavior with the NEUROGES-ELAN system.
Behavior Research Methods, Instruments, & Computers, 41(3), 841-849. doi:10.3758/BRM.41.3.591.
- Sloetjes, H., & Wittenburg, P. (2008).