Tid: Onsdag 17 december 2014, 13:00-15:00
Plats: C389, Södra huset, Frescati

Postseminarium följer direkt efter skuggoppositionen i institutionens pentry.

Skuggopponent är professor Joakin Nivre, Institutionen för lingvistik och filologi vid Uppsala universitet.


In this thesis I explore Bayesian models for word alignment, how they can be improved through joint annotation transfer, and how they can be extended to parallel texts in more than two languages. In addition to these general methodological developments, I apply the algorithms to problems from sign language research and linguistic typology.

In the first part of the thesis, I show how Bayesian alignment models estimated with Gibbs sampling are more accurate than previous methods for a range of different languages, particularly for the vast majority of languages with few digital resources available. Furthermore, I explore how different variations to the models and learning algorithms affect alignment accuracy. Then, I show how part-of-speech annotation transfer can be performed jointly with word alignment, to improve the accuracy of both the word alignments and the transfered annotation. I apply this model to help annotate the Swedish Sign Language Corpus (SSLC) with part-of-speech tags, and to investigate patterns of polysemy across the languages of the world.

Finally, I present a model for multilingual word alignment which learns an intermediate representation of the text. This model is then used with a massively parallel corpus containing translations of the New Testament, to explore word order features in 1001 languages.


Mats Wirén & Robert Östling