Welcome to Cistern
Cistern is the principal repository of tools and resources released by the Center for Information and Language Processing (CIS) of the University of Munich (LMU).
The CIS conducts research on linguistically-informed statistical natural language processing (NLP) including problems such as part-of-speech tagging, parsing and sentiment analysis.
On this site, we store and share tools and resources such as data sets, lexicons, binaries and models.
Currently you can find resources for the following projects:
- SMOR - a German computational morphology
- LatMor - a Latin computational morphology
- MarMoT - a fast and accurate morphological tagger
- AutoExtend - extending word embeddings
- Lemming - a flexible and accurate lemmatizer
- CoSimRank - a fast and accurate graph based similarity measure
- Ocrocis - a project manager for the OCR toolkit Ocropy by Thomas Breuel
- SFST - a finite state transducer toolkit
- MarLiN - a fast word clustering tool
- RFTagger - a tool for the annotation of text with fine-grained POS tags
- TreeTagger - a tool for annotating text with part-of-speech and lemma information
- BitPar - a parser for highly ambiguous probabilistic context-free grammars
- corefResources - automatically extracted coreference chains from English Gigaword data
- FIGMENT - a fine-graine embeddings based entity typer
- ChipMunk - a morphological segmenter and analyzer
- SFbenchmark - relation classification benchmark for Slot Filling