The course introduces some concepts and methods from Computational Linguistics that are useful in Corpus Linguistics, including quantitative properties of language, n-grams, regular expressions and finite-state automata, tokenisation, part-of-speech tagging, and syntactic analysis. The bulk of the course then describes the collection, representation and annotation of data in different modalities (text, speech, signing), types of corpora (such as sample corpora, monitor corpora and web corpora), and corpus analytics based on occurrence and co-occurrence frequencies. In addition, the suitability of different corpus materials for different research questions is discussed.

Syllabus, schedule and literature list

Syllabus | Schedule Autumn 2017 |  Literature list Autumn 2017 (88 Kb)

Lecturer

Mats Wirén, mats.wiren@ling.su.se

Education

The teaching consists of lectures, laboratory exercises and seminars.

Instruction language

English

Prerequisites and special admittance requirements

Admitted to The Master’s Programme in Language Sciences at the Faculty of Humanities, or completed course Linguistics - Bachelor's course, 30 ECTS credits, Phonetics - Bachelor's course, 30 ECTS credits or Computational Linguistics - Bachelor's course, 30 ECTS credits. Swedish Upper Secondary School course English B/English 6 or equivalent.