Stockholm Internet Corpus (SIC)
The SIC project aims to create a freely available, manually annotated corpus of Swedish Internet texts. So far, a small corpus (8174 tokens) of blog texts has been created. The tagset and data format is adapted from the Stockholm-UmeƄ Corpus (SUC), on which the corpus is modelled.
SIC is primarily intended for researchers developing and testing Natural Language Processing (NLP) tools working with Internet texts. Linguists and general users interested in searching texts from Swedish blogs would probably find the Korp concordancer at Språkbanken to be more useful.

(the Creative Commons Attribution-ShareAlike 3.0 Unported), allowing researchers to modify and redistribute the corpus.
If you are the author of a Swedish blog, you can help us expanding the corpus by licensing your blog under the same Creative Commons license (just put a note about it on your blog), and telling us about it!
Downloads:
Current version (zip archive) (108 Kb)
Contact: Robert Östling
Last updated:
October 29, 2012
Source: Department of Linguistics
