Parallel texts and multilingual language processing

The central vehicle for our research in this area is parallel texts, translation equivalents of the same material in different languages. An important application of parallel texts is the transfer of linguistic annotation and computational models between languages. This allows (typically highly labour-intensive) annotation developed for one language to be applied to many other languages lacking such resources. Another application is fast large-scale investigations in linguistic typology, carried out in cooperation with the Section for General Linguistics. Neural machine translation and language modelling, which has emerged in the last few years following advances in machine learning, is yet another area in which we are active.

First- and second-language acquisition

A general goal of this research is to understand the nature of the information conveyed between parents and infants, and the implications of this for first-language acquisition. To this end, we have developed multimodal annotation for corpora of parent–child interaction, and have studied phenomena such as repetitiousness, disfluencies, speech rate, and the synchrony between social cues (mainly eye gaze and hands) and spoken utterances. In second-language acquistion, we have investigated specific techniques for aiding learners, such as distinguishing spelling errors potentially related to pronunciation from other errors. We also participate in SweLL, a project for development of a general infrastructure for learner corpora and tools for Swedish.

Linguistic resources

The section provides key resources for linguistic research as well as practical natural-language-processing systems. Building on data sets collected by the Section for Phonetics and the Sign Language Section, we have developed rich linguistic annotations for parent–child interaction and for dialogues in Swedish Sign Language. We have also produced annotated text corpora of written Swedish, including a collection of Strindberg's works and extended annotations for the Stockholm–Umeå Corpus (SUC). The natural-language-processing tools that we provide include systems for part-of-speech tagging, word alignment and machine translation. Much of our activities under this heading are carried out through the Swe-Clarin project.