Half-time seminar: Amanda Kann, PhD student at the Department of Linguistics, Stockholm University
Title: Extracting Word Order Typology from Text: Issues and Uses.

Abstract
Multilingual natural language processing (NLP) relies on linguistic typology, in the sense that it often makes use of structural similarity and variation across languages. Despite this, trying to explicitly include typological knowledge in multilingual NLP model training has yielded limited results.
One possible explanation for this is that the format of the provided typological information is too coarse-grained to be useful for the models. Most commonly, binary features from typological databases are used, which condense structural variation into broad categorical distinctions. Gradient features extracted directly from language data could be a better fit for this purpose, since they capture variation in greater detail.
In this talk, I will present results from the two first studies in my compilation thesis. Both focus on extracting gradient word order features from text, primarily dealing with methodological aspects which prior work has suggested may be obstacles to reliability and representativity: variation across different genres of text, and whether translations can be used in place of original language data.
- In study 1, word order features extracted from the Parallel Bible Corpus are examined for consistency across same-language translations and compared to reference features from Universal Dependencies treebanks.
- Study 2 addresses the influence of translation effects by comparing original texts and translations across the 21 languages in the EuroParl corpus of European Parliament proceedings.
Finally, I will discuss future directions for my project, including alternative methods for typological feature extraction and potential applications for multilingual NLP.
The seminar is held in English.
Last updated: 2026-01-20
Source: The Department of Linguistics