Welcome to Cistern
Cistern is the principal repository of tools and resources released by the Center for Information and Language Processing (CIS) of the University of Munich (LMU).
The CIS conducts research on linguistically-informed statistical natural language processing (NLP) including problems such as part-of-speech tagging, parsing and sentiment analysis.
On this site, we store and share tools and resources such as data sets, lexicons, binaries and models.
is a high-quality word alignment tool that uses static and contextualized embeddings and does not require parallel training data.
- DensRay -
interpretable dimension in word embedding spaces
- SherLIiC - a hard NLI evaluation benchmark using lexical inference in context (LIiC)
- Comult - Embeddings for 1000+ languages.
- MED - Code of the LMU system for the SIGMORPHON 2016 shared task on morphological reinflection
- corefResources - two corpora of automatically extracted coreference chains: (1) KBPchains, (2) English Gigaword data
- noise-mitigation - Noise Mitigation for Neural Entity Typing and Relation Extraction
- FIGMENT - a fine-grained embedding-based entity typer
- FIGMENT2 - fine-grained entity typing using multi-level representation of entities
- Lemming - a flexible and accurate lemmatizer
- MarMoT - a fast and accurate morphological tagger
- ChipMunk - a morphological segmenter and analyzer
- LatMor - a Latin computational morphology
- MarLiN - a fast word clustering tool
- BitPar - a parser for highly ambiguous probabilistic context-free grammars
- TreeTagger - a tool for annotating text with part-of-speech and lemma information
- RFTagger - a tool for the annotation of text with fine-grained POS tags
- Ocrocis - a project manager for the OCR toolkit Ocropy by Thomas Breuel
- SFST - a finite state transducer toolkit
- SMOR - a German computational morphology
- AttentionUncertainty - attention methods for uncertainty detection
- AutoExtend - extending word embeddings
- CoSimRank - a fast and accurate graph based similarity measure
- GlobalNormalization - The code, parameters and prepared dataset used for global normalization of convolutional neural networks for joint entity and relation classification
- Open Relation Argument Extraction - Corpus and code for extracting relation arguments of non-standard type.
- SFbenchmark - relation classification benchmark for Slot Filling
- CIS_SlotFilling - the CIS slot filling system
- semiCRF - a character-based neural network with semi-Markov CRF output layer for robust multilingual part-of-speech tagging