Pretrained Models

This site lists pretrained MarMoT models. The models are free to use for non-commercial research and education. The models can be applied by running:

java -cp marmot.jar marmot.morph.cmd.Annotator\
--model-file de.marmot\
--test-file form-index=0,text.txt\
--pred-file text.out.txt\
If text.txt is a text file containing tokenized text in a one token per line format. Sentence boundaries should be marked by an empty line.
Here is an example:
This
is
a
sentence
.

This
is
another
sentence
.

The input to the Arabic and Hebrew models should be in the transliterated ASCII format, described in the documentation of the respective treebank. Here are two examples:

Arabic
w >DAf " <n AlEmlyp l AlEwdp b AlbAqyn w Edd hm 193 jndyA l wDE hm fy >mAn , tqdmt kvyrA " .
Hebrew
yyQUOT THIH NQMH W BGDWL yyDOT

The output of the MTE model is in an intermediate format. In order to restore the original annotation you need to run remap_mte.py:

python remap_mte.py text.out.txt text.out.final.txt

The latest MarMoT binaries can be found here.

MultExt East Models

lang source model
bg Multext-East (2010-05-14) bg.marmot
cs Multext-East (2010-05-14) cs.marmot
en Multext-East (2010-05-14) en.marmot
et Multext-East (2010-05-14) et.marmot
fa Multext-East (2010-05-14) fa.marmot
hu Multext-East (2010-05-14) hu.marmot
pl Multext-East (2010-05-14) pl.marmot
ro Multext-East (2010-05-14) ro.marmot
sk Multext-East (2010-05-14) sk.marmot
sl Multext-East (2010-05-14) sl.marmot
sr Multext-East (2010-05-14) sr.marmot

SPMRL Models

lang source model
pl Składnica Treebank pl.marmot
eu Basque Syntactic Treebank eu.marmot
ar LDC Arabic Penn Treebank / Columbia Arabic Treebank ar.marmot
de Tiger 2.0 de.marmot
ko KAIST Treebank ko.marmot
sv Talbanken sv.marmot
hu Szeged (Dependency) Treebank hu.marmot
he Modern Hebrew Treebank he.marmot
fr French Treebank fr.marmot


If you want to run parsing experiments on the SPMRL data sets we also provide cross-annotated predictions for train, dev and test (marmot_spmrl.tar.bz2) a description is available here.


Acknowledgement: Thanks to Djamé Seddah and Toma Erjavec.
Contact: Thomas Müller (cis page)