Pretrained Models

This site lists pretrained MarMoT models. The models are free to use for non-commercial research and education. The models can be applied by running:

java -cp marmot.jar marmot.morph.cmd.Annotator\
--model-file de.marmot\
--test-file form-index=0,text.txt\
--pred-file text.out.txt\

If text.txt is a text file containing tokenized text in a one token per line format. Sentence boundaries should be marked by an empty line.
Here is an example:

This
is
a
sentence
.

This
is
another
sentence
.

The input to the Arabic and Hebrew models should be in the transliterated ASCII format, described in the documentation of the respective treebank. Here are two examples:

Arabic

w >DAf " <n AlEmlyp l AlEwdp b AlbAqyn w Edd hm 193 jndyA l wDE hm fy >mAn , tqdmt kvyrA " .

Hebrew

yyQUOT THIH NQMH W BGDWL yyDOT

The output of the MTE model is in an intermediate format. In order to restore the original annotation you need to run remap_mte.py:

python remap_mte.py text.out.txt text.out.final.txt

The latest MarMoT binaries can be found here.

MultExt East Models

lang	source	model
bg	Multext-East (2010-05-14)	bg.marmot
cs	Multext-East (2010-05-14)	cs.marmot
en	Multext-East (2010-05-14)	en.marmot
et	Multext-East (2010-05-14)	et.marmot
fa	Multext-East (2010-05-14)	fa.marmot
hu	Multext-East (2010-05-14)	hu.marmot
pl	Multext-East (2010-05-14)	pl.marmot
ro	Multext-East (2010-05-14)	ro.marmot
sk	Multext-East (2010-05-14)	sk.marmot
sl	Multext-East (2010-05-14)	sl.marmot
sr	Multext-East (2010-05-14)	sr.marmot

SPMRL Models

lang	source	model
pl	Składnica Treebank	pl.marmot
eu	Basque Syntactic Treebank	eu.marmot
ar	LDC Arabic Penn Treebank / Columbia Arabic Treebank	ar.marmot
de	Tiger 2.0	de.marmot
ko	KAIST Treebank	ko.marmot
sv	Talbanken	sv.marmot
hu	Szeged (Dependency) Treebank	hu.marmot
he	Modern Hebrew Treebank	he.marmot
fr	French Treebank	fr.marmot

If you want to run parsing experiments on the SPMRL data sets we also provide cross-annotated predictions for train, dev and test (marmot_spmrl.tar.bz2) a description is available here.

Acknowledgement: Thanks to Djamé Seddah and Toma Erjavec.
Contact: Thomas Müller (cis page)