Welcome to Cistern
Cistern is the principal repository of tools and resources released by the Center for Information and Language Processing (CIS) of the University of Munich (LMU).
The CIS conducts research on linguistically-informed statistical natural language processing (NLP) including problems such as part-of-speech tagging, parsing and sentiment analysis.
On this site, we store and share tools and resources such as data sets, lexicons, binaries and models.
Currently you can find resources for the following projects:
- SMOR - a German computational morphology
- MarMoT - a fast and accurate morphological tagger
- AutoExtend - extending word embeddings
- Lemming - a flexible and accurate lemmatizer
- CoSimRank - a fast and accurate graph based similarity measure
- Ocrocis - a project manager for the OCR toolkit Ocropy by Thomas Breuel
- SFST - a finite state transducer toolkit
- MarLiN - a fast word clustering tool
- RFTagger - a tool for the annotation of text with fine-grained POS tags
- TreeTagger - a tool for annotating text with part-of-speech and lemma information
- BitPar - a parser for highly ambiguous probabilistic context-free grammars
- corefResources - automatically extracted coreference chains from English Gigaword data
- FIGMENT - a fine-grained embedding-based entity typer
- FIGMENT2 - fine-grained entity typing using multi-level representation of entities
- AttentionUncertainty - attention methods for uncertainty detection
- ChipMunk - a morphological segmenter and analyzer
- LatMor - a Latin computational morphology
- MED - Code of the LMU system for the SIGMORPHON 2016 shared task on morphological reinflection
- SFbenchmark - relation classification benchmark for Slot Filling