Tid: Torsdag 27 september kl. 13:00 - 15:00
Plats: C307, Södra huset, Frescati

Seminariet ordnas i samarbete med Forskarskolan i språkvetenskap


Massively parallel texts are parallel texts in which the same text is available in very many languages (100+). Although not many of such text exist, the ones that are available offer an incredibly rich resource for language comparison. The most important innovation they have to offer is that they allow for a completely new way of dealing with the perennial problem of comparing 'like with like'. As it turns out, this method appears to have direct methodological parallels to vector-space modelling as currently widely used for automatic monolingual corpora.

The traditional approach to comparing like with like in language comparison is to use a functional/semantic definition of a tertium comparationis. Now, just as in monolingual vector-space modelling, in which semantics is approached through distribution of forms in corpora, I will replace the functional/semantic definition with the distribution of forms in a parallel text. Two forms from different languages have a similar meaning when they show a similar distribution in the parallel text. Generalising this to a massively parallel text immediately results in typological language comparison.

In this talk I will present a few recent case studies of using massively parallel text. The first example will investigate the automatic induction of semantic roles from a parallel text, and the second example will look at the coding of 'who'-like interrogatives. Using these examples I will discuss some of the practical issues of using matrix algebra for language comparison.

Mer information om Michel Cysouw


Ljuba Veselinova, Maria Koptjevskaja Tamm & Tomas Riad