Computational Linguistics

Corpora, resources and tools
 

Welcome to our new site containing a collection of corpora, resources and tools from the Section for Computational Linguistics.

The Section for Computational Linguistics makes parts of our Natural Language Processing software, resources and corpora available to the public.

The links to the left give you more information about our various corpora, including Swedish Blog Sentences (2.7 billion tokens), the Stockholm Umeå Corpus (1 million words), SUC-CORE (a 20 000 word subset of SUC with NP coreference annotation), and the Stockholm University Strindberg Corpus (400 000 tokens).

The tools we distribute include Stockholm Language Model with Entropy (SLME), Swedish Python Routines (SPyRo) including compound analysis for Swedish, and the Stockholm Tagger (Stagger), a part-of-speech tagger and NE recognizer for Swedish.

Read more on our research and some of the projects that we currently work on.

Bookmark and share Tell a friend

CONTACT

Section head: Mats Wirén
Email: mats.wiren@ling.su.se

Website URL: www.ling.su.se/nlp

Section for Computational Linguistics:
www.ling.su.se/compling
www.ling.su.se/DaLi

Stockholm University Research Database