Research

Take a look at our latest publications and awards

Recent progress in the area of Artificial Intelligence (AI) is tremendous and amazing. Almost monthly we see reports announcing new breakthroughs in different technological aspects of AI.

As an organization focussing on research and development, we can look back on an increasing number of publications and awards.

Publications

We aim to push the state-of-the-art for problems such as automatic text recognition (ATR), language modeling (LM), named entity recognition (NER), visual question answering (VQA) and image segmentation (IS) even beyond human performance.

Our team of experienced AI researchers is working with and improving techniques such as:

  • fully convolutional neural networks
  • attention-based recurrent free models as well as in combination with recurrent models
  • graph neural networks
  • neural memory techniques
  • unsupervised and self-supervised pre-training strategies
  • improved learning strategies

Automated text recognition is a fundamental problem in Document Image Analysis. Optical models are used for modeling characters while language models are used for composing sentences. Since the scripts and linguistic context differ widely, it is mandatory to specialize the models by training on task-dependent ground-truth. However, to create a sufficient amount of ground-truth, at least for historical handwritten scripts, well-qualified persons have to mark and transcribe text lines, which is very time-consuming. On the other hand, in many cases unassigned transcripts are already available on page-level from another process chain, or at least transcripts from similar linguistic context are available. In this work we present two approaches that make use of such transcripts: whereas the first one creates training data by automatically assigning page-dependent transcripts to text lines, the second one uses a task-specific language model to generate highly confident training data. Both approaches are successfully applied on a very challenging historical handwritten collection.

Authors: Gundram Leifert (PLANET artificial intelligence GmbH), Joan Andreu Sànchez (Pattern Recognition and Human Language Technologies Center), Roger Labahn (Computational Intelligence Technology Lab)

Series: ICFHR ’20

Pages: To appear

Note: This work was partially funded by the Generalitat Valenciana under the EU-FEDER Comunitat Valenciana 2014-2020 grant IDIFEDER/2018/025 “Sistemas de fabricación inteligente para la indústria 4.0”. | in proceeding

Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end, we propose an attention-based sequence-to-sequence model. It combines a convolutional neural network as a generic feature extractor with a recurrent neural network to encode both the visual information, as well as the temporal context between characters in the input image, and uses a separate recurrent neural network to decode the actual character sequence. We make experimental comparisons between various attention mechanisms and positional encodings, in order to find an appropriate alignment between the input and output sequence. The model can be trained end-to-end and the optional integration of a hybrid loss allows the encoder to retain an interpretable and usable output, if desired. We achieve competitive results on the IAM and ICFHR2016 READ data sets compared to the state-of-the-art without the use of a language model, and we significantly improve over any recent sequence-to-sequence approaches.

Authors: Michael, Johannes and Labahn, Roger and Grüning, Tobias and Zöllner, Jochen

Booktitle: Proceedings of the 2019 15th International Conference on Document Analysis and Recognition

Series: ICDAR ’19

Pages: To appear

Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding

Read the article

Measuring the performance of text recognition and text line detection engines is an important step to objectively compare systems and their configuration. There exist well-established measures for both tasks separately. However, there is no sophisticated evaluation scheme to measure the quality of a combined text line detection and text recognition system. The F-measure on word level is a well-known methodology, which is sometimes used in this context. Nevertheless, it does not take into account the alignment of hypothesis and ground truth text and can lead to deceptive results. Since users of automatic information retrieval pipelines in the context of text recognition are mainly interested in the end-to-end performance of a given system, there is a strong need for such a measure. Hence, we present a measure to evaluate the quality of an end-to-end text recognition system. The basis for this measure is the well established and widely used character error rate, which is limited — in its original form — to aligned hypothesis and ground truth texts. The proposed measure is flexible in a way that it can be configured to penalize different reading orders between the hypothesis and ground truth and can take into account the geometric position of the text lines. Additionally, it can ignore over- and under- segmentation of text lines. With these parameters it is possible to get a measure fitting best to its own needs.

Authors: Leifert, Gundram and Labahn, Roger and Grüning, Tobias and Leifert, Svenja

Booktitle: Proceedings of the 2019 15th International Conference on Document Analysis and Recognition

Series: ICDAR ’19

Pages: To appear

Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding

Read the article

We present a recognition and retrieval system for the ICDAR2017 Competition on Information Extraction in Historical Handwritten Records which successfully infers person names and other data from marriage records. The system extracts information from the line images with a high accuracy and outperforms the baseline. The optical model is based on Neural Networks. To infer the desired information, regular expressions are used to describe the set of feasible words sequences.

Authors: Tobias Strauß and Max Weidemann and Johannes Michael and Gundram Leifert and Tobias Grüning and Roger Labahn

Journal: CoRR

Volume: abs/1804.09943

Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ)

Read the article

Accessibility of the valuable cultural heritage which is hidden in countless scanned historical documents is the motivation for the presented dissertation. The developed (fully automatic) text line extraction methodology combines state-of-the-art machine learning techniques and modern image processing methods. It demonstrates its quality by outperforming several other approaches on a couple of benchmarking datasets. The method is already being used by a wide audience of researchers from different disciplines and thus contributes its (small) part to the aforementioned goal.

Author: Tobias Grüning

Type: PhD thesis

School: Universität Rostock