Tid: Torsdag 12 maj 2016 15:00–17:00 
Plats: C307, Södra huset, Frescati

Postseminarium följer direkt efter seminariet i institutionens pentry.

In this interdisciplinary investigation we show that using computational linguistic approaches is beneficial for descriptive linguistics and psycholinguistics. We discuss the question of similarities and differences between the language processing by humans and by machines.

After advocating the use of models for descriptive linguistics (Evert 2006, Gries 2012) we use surprisal (Levy and Jaeger 2007), a part-of-speech tagger (Schmid 1994), measures of readability (e.g. Senter and Smith 1967) and a syntactic parser (Schneider 2008) as models of a native speaker. When native speakers construct sentences they employ argument structure, fixedness and collocations (Evert 2009), alternations, choice of synonyms and register as subtle operations (Pawley and Syder 1983). Sentences are rendered in the way that they are due to many complex and interacting factors, and even subtle failures, as learner language exhibits, increase processing load, both for the human and the automatic parser, and potentially lead to increased ambiguity as they add further noise in the sense of Shannon's noisy channel model.

We use first use surprisal as a model of fixedness. Fixedness is a means to measure the influence of the idiom principle (Sinclair 1991). Fixedness and entrenchment are closely related (Bybee 2007, Blumenthal-Dramé 2012). We measure correlations between fixedness and language learner level. For first language acquisition it has been shown that lexical-specific idiom-based language use precedes creativity (Tomasello 2000), for second language acquisition the situation is less clear (Ellis 2012).

We then use a part-of-speech tagger as a model of surface ambiguity. Entrenched structures get higher tagger and parser scores, as they are expected. Third, we apply various well-known measures of readability (e.g. Senter and Smith 1967, Covington & McFall 2010, Vajjala and Meurers 2012) to learner data.

Finally, we use a parser as model of structural ambiguity. The parser (Schneider 2008) makes an explicit distinction between a manually written competence grammar and probabilistic performance disambiguation, which allows us to explore the dichotomy between Sinclair's (1991) open choice principle and idiom principle, and the role of ambiguity. We also discuss that well established principles like end-weight can be explained in terms of avoidance of ambiguity, and raise the question if derived concepts such as minimal dependency length are a language universal.

Mer information om föreläsaren: Gerold Schneider

Hjärtligt välkomna!
Gintaré Grigonyté, Mats Wirén & Ljuba Veslinova