Stagger – The Stockholm Tagger
Stagger is a Swedish part-of-speech tagger based on Collins (2002) averaged perceptron. Per-token accuracy is about 96.6 percent (10-fold cross validation on SUC 3.0), which compares favorably to other published results for Swedish.
Stagger is a Swedish part-of-speech tagger based on
Collins (2002) (225 Kb)
averaged perceptron. Per-token accuracy is about 96.6 percent (10-fold cross validation on SUC 3.0), which compares favorably to other published results for Swedish.
Apart from using a feature-rich model, other improvements from earlier systems include the use of the SALDO lexicon of Swedish morphology, and semi-supervised learning through e.g. Brown clusters or Collobert & Weston embeddings.
Stagger also contains a Named Entity Recognizer, whose labelled F-score is 70.7% using cross-validation on SUC with the labels of Salomonsson, Marinov and Nugues (2012), whose state-of-the-art system reaches 74.0% under the same conditions.
The tagger is implemented in Java, and licensed under the GNU General Public License (GPL) version 3, protecting among other things the freedom to use, modify and redistribute the software.
Download:
Source code and JAR executable
Swedish model
Icelandic model (contributed by Hrafn Loftsson at Reykjavik University)
Brief article presenting Stagger
Contact: Robert Östling
Last updated:
November 9, 2012
Source: Department of Linguistics
