Brat screenshot

The annotation was done manually by two annotators using  BRAT, a web-based tool for collaborative annotation. The annotation task was restricted to three types of referring expressions:

  • Name mentions (NAM): proper names and other named entities.
  • Nominal mentions (NOM): NPs with a lexical noun, or a nominalized adjective or a participle as head.
  • Pronominal mentions (PRO): personal pronouns, demonstrative pronouns, and reflexive pronouns. We also include possessives and genitives in this category.

SUC-CORE consists of the same documents as the evaluation set of the Swedish Treebank. Thus, the coreference annotation of SUC-CO E can be combined with the part-of-speech tagging, morphosyntactic analysis and named entity annotation of SUC 2.0, and the syntactic analysis of the Swedish Treebank.

Both informative and imaginative text of different genres and domains are included. The informative prose category consists of six files with foreign and domestic news texts and editorials from national and regional morning dailys, magazine articles on interior design, a textbook excerpt on biology, and an academic essay. The imaginative prose section includes excerpts from four novels of different genres. Thus, SUC-CORE can be used to explore coreference in different types of text.

SUC-CORE is distributed in a stand-off format similar to the BioNLP Shared Task standoff format. SUC-CORE is freely available for research, but every user must sign a license for SUC with the Department of linguistics at Stockholm University. Please contact Kristina Nilsson Björkenstam for more information on how to access SUC-CORE.

Nilsson Björkenstam, K. (2013). SUC-CORE: A Balanced Corpus Annotated with Noun Phrase Coreference. Northern European Journal of Language Technology (NEJLT), 3, 19-39.

Kristina Nilsson Björkenstam,

Brat screenshot