Welcome to Cistern
Cistern is the principal repository of tools and resources released by the Center for Information and Language Processing (CIS) of the University of Munich (LMU).
The CIS conducts research on linguistically-informed statistical natural language processing (NLP) including problems such as part-of-speech tagging, parsing and sentiment analysis.
On this site, we store and share tools and resources such as data sets, lexicons, binaries and models.
Currently you can find resources for the following projects:
- SMOR - a German computational morphology
- MarMoT - a fast and accurate morphological tagger
- AutoExtend - extending word embeddings
- Lemming - a flexible and accurate lemmatizer
- CoSimRank - a fast and accurate graph based similarity measure
- Ocrocis - a project manager for the OCR toolkit Ocropy by Thomas Breuel
- SFST - a finite state transducer toolkit
- MarLiN - a fast word clustering tool
- RFTagger - a tool for the annotation of text with fine-grained POS tags
- TreeTagger - a tool for annotating text with part-of-speech and lemma information
- BitPar - a parser for highly ambiguous probabilistic context-free grammars
- corefResources - two corpora of automatically extracted coreference chains: (1) KBPchains, (2) English Gigaword data
- FIGMENT - a fine-grained embedding-based entity typer
- FIGMENT2 - fine-grained entity typing using multi-level representation of entities
- noise-mitigation - Noise Mitigation for Neural Entity Typing and Relation Extraction
- AttentionUncertainty - attention methods for uncertainty detection
- ChipMunk - a morphological segmenter and analyzer
- LatMor - a Latin computational morphology
- MED - Code of the LMU system for the SIGMORPHON 2016 shared task on morphological reinflection
- SFbenchmark - relation classification benchmark for Slot Filling
- NONSYMB The dataset used in Nonsymbolic Text Representation
- GlobalNormalization The code, parameters and prepared dataset used for global normalization of convolutional neural networks for joint entity and relation classification
- CoMult Supplementary material and multilingual embedding spaces from the paper "Embedding Learning Through Multilingual Concept Induction"