SLME is a simple N-gram language model, inspired in spirit by the Stupid Backoff Model (Brants et al., 2007) but also designed to provide very fast calculation of the entropy of the distribution of words following a given context: H(p(word|context)).

The software is implemented as a Python library with the low-level operations in C. It is tested on Google's Web 1T 5-gram, 10 European Languages N-gram corpus. The basic idea is to keep a lot of data cached in memory, which (along with some hard-coded constraints) means that the corresponding but much larger English-only data set will not work.

 
GPL 3
 
 

This software is licensed under the GNU General Public License (GPL) version 3, protecting among other things the freedom to use, modify and redistribute the software.

 

Downloads: Version 0.2 (tar.gz archive) (9 Kb)

Contact: Robert Östling