SUC-CORE: SUC 2.0 Annotated with NP Coreference
SUC-CORE is a 20 000 word subset of the Stockholm-UmeƄ Corpus (SUC 2.0) annotated with coreference relations between noun phrases. The corpus covers a wide range of genres and domains, and is freely available for research.

The annotation was done manually by two annotators using BRAT, a web-based tool for collaborative annotation. The annotation task was restricted to three types of referring expressions:
- Name mentions (NAM): proper names and other named entities.
- Nominal mentions (NOM): NPs with a lexical noun, or a nominalized adjective or a participle as head.
- Pronominal mentions (PRO): personal pronouns, demonstrative pronouns, and reflexive pronouns. We also include possessives and genitives in this category.
SUC-CORE consists of the same documents as the evaluation set of the Swedish Treebank. Thus, the coreference annotation of SUC-CO E can be combined with the part-of-speech tagging, morphosyntactic analysis and named entity annotation of SUC 2.0, and the syntactic analysis of the Swedish Treebank.
Both informative and imaginative text of different genres and domains are included. The informative prose category consists of six files with foreign and domestic news texts and editorials from national and regional morning dailys, magazine articles on interior design, a textbook excerpt on biology, and an academic essay. The imaginative prose section includes excerpts from four novels of different genres. Thus, SUC-CORE can be used to explore coreference in different types of text.
SUC-CORE is distributed in a stand-off format similar to the BioNLP Shared Task standoff format. SUC-CORE is freely available for research, but every user must sign a license for SUC with the Department of linguistics at Stockholm University. Please contact Kristina Nilsson Björkenstam for more information on how to access SUC-CORE.
Publications
Kristina Nilsson Björkenstam & Emil Byström. SUC-CORE: SUC 2.0 Annotated with NP Coreference. In: Proceedings of the Fourth Swedish Language Technology Conference (SLTC). October 24-26, 2012, Lund.
Contact
Kristina Nilsson Björkenstam

Last updated:
October 25, 2012
Source: Department of Linguistics
