Tid: 20 oktober kl. 15-17
Plats: C389, hus C, plan 3, Stockholms universitet.

Postseminarium äger rum direkt efter seminariet i institutionens pentry. Under postseminariet kommer också att visas en schweizisk film i C307 (för dem som har tid och lust med detta).

Abstract

A traditional approach to compare languages is to look at the similarity of the sound structure of words with similar meaning. Thus, Swedish känna and German kennen are more similar than Swedish känna and Finnish tuntea. This similarity is about the exterior form of words. Here, however, the inner form of words is considered; that is, not whether words have similar sound structure, but to what extent they are used in a similar way in discourse. In this respect, Swedish känna and tuntea are quite close, because they share the meanings ‘know a person’ and ‘feel’, which are expressed by different verbs in German. Language use is understood here as the distribution of words across texts. Parallel texts are translations of the same text in different languages and are therefore handy tools for comparing the use of words across languages. Now there are two kinds of words: lexemes and wordforms. The former abstract away from formal grammatical differences while in the latter grammatical and lexical information is intertwined. To keep things easy, only wordforms will be addressed which allows us to consider the inner form of languages in grammar and lexicon in combination. A method is presented how the similarity of languages in their inner form can be quantified automatically from parallel texts without any manual analysis of the text material. This is done on the basis of a strongly biased sample where the comparison of expected bias and obtained result is one of several evaluation techniques. While this approach discussed in the first part of the talk is useful to obtain a global measure of language similarity, it is not useful for determining structural similarities in a particular functional domain.

The second part of the talk will therefore concentrate on one particular domain which is highly congruent in use cross-linguistically: negation. With the help of this example, a crucial design feature of language is discussed where natural languages differ from artificial languages such as the language of formal semantics, viz. polymorphy. While polysemy means that a form has several meanings, polymorphy means that one semantic category is expressed by several different formal exponents. Most languages have more than one form to express negation; negation thus tends to be polymorphous in most natural languages. However, languages differ considerably in how exactly the different forms of negation are distributed in usage. In order to measure this difference, negation markers are first extracted automatically from parallel texts. The algorithm used has some crucial shortcomings (which are implemented in it because they are at the same time its strengths) which will be discussed in some detail. In a next step, then, the similarity of negation in different markers is measured by comparing the similarity in use of the polymorphous negation markers across languages. The algorithmic approach has the advantage that very little data reduction is required which entails that in the result each language happens to end up as a type of its own, while some of these types are more similar to each other than others.

Finally, the two case studies presented are put in a more general context of discussing to what extent typology can be done completely automatically with texts (and without reference grammars and without typologists). It will be discussed to what extent the kind of typology that can be done that way differs from traditional typology and whether traditional typology can learn anything from fully algorithmic approaches.

 
References

Wälchli, B. (2011). Quantifying Inner Form. A Study in Morphosemantics. Online Publication. Arbeitspapiere. Bern: Institut für Sprachwissenschaft.
http://www.isw.unibe.ch/unibe/philhist/isw/content/e4229/e4355/e6592/e6593/Arbeitspapier-46_ger.pdf

Wälchli, B. (forthc.). Algorithmic typology, aggregating without features and going from known to similar unknown categories within and across languages. In Szmreczanyi, B. & Wälchli, B. (eds.), Aggregating dialectology and typology. To be published in Walter de Gruyter’s Linguae et Litterae series.

 
Hjärtligt välkomna!