SIC is primarily intended for researchers developing and testing Natural Language Processing (NLP) tools working with Internet texts. Linguists and general users interested in searching texts from Swedish blogs would probably find the Korp concordancer at Språkbanken to be more useful. 

 
CC by SA
 

The tagset and data format is adapted from the Stockholm–Umeå Corpus (SUC), on which the corpus is modelled. One important difference is that SIC uses a more permissive license
(the Creative Commons Attribution-ShareAlike 3.0 Unported), allowing researchers to modify and redistribute the corpus. The annotation was done by Robert Östling, Johan Sjons and Johannes Bjerva, by manually correcting the output of Stagger.

If you are the author of a Swedish blog, you can help us expanding the corpus by licensing your blog under the same Creative Commons license (just put a note about it on your blog), and telling us about it!

Downloads: Download SIC (zip) (173 Kb)

Contact: Robert Östling