Stagger is a Swedish part-of-speech tagger based on Collins (2002) (225 Kb) averaged perceptron. Per-token accuracy is about 96.6 percent (10-fold cross validation on SUC 3.0), which compares favorably to other published results for Swedish.
Apart from using a feature-rich model, other improvements from earlier systems include the use of the SALDO lexicon of Swedish morphology, and semi-supervised learning through e.g. Brown clusters or Collobert & Weston embeddings.
Stagger also contains a Named Entity Recognizer, whose labelled F-score is 70.7% using cross-validation on SUC with the labels of Salomonsson, Marinov and Nugues (2012), whose state-of-the-art system reaches 74.0% under the same conditions.
The tagger is implemented in Java, and licensed under the GNU General Public License (GPL) version 3, protecting among other things the freedom to use, modify and redistribute the software.
IceStagger
Loftsson and Östling (2013) used a modified version of Stagger, IceStagger, integrated with the morphological analyzer IceMorphy and the Icelandic Frequency Dictionary lexicon to obtain state-of-the-art accuracy for Icelandic part of speech tagging.
IceStagger has now been integrated into the IceNLP toolkit, which is now the recommended option for users interested in high-accuracy analysis of Icelandic.
Robert Östling (2013) Stagger: an Open-Source Part of Speech Tagger for Swedish. Northern European Journal of Language Technology, 2013, Vol. 3, pp 1–18.
DOI 10.3384/nejlt.2000-1533.1331
Stagger: an Open-Source Part of Speech Tagger for Swedish (1121 Kb)
(PDF, öppnas i nytt fönster)
Hrafn Loftsson and Robert Östling (2013) Tagging a Morphologically Complex Language Using an Averaged Perceptron Tagger: The Case of Icelandic. NoDaLiDa 2013, Oslo, Norway. Loftsson and Östling 2013 (192 Kb)
Contact: Robert Östling