Research

Take a look at our latest publications and awards

Recent progress in the area of Artificial Intelligence (AI) is tremendous and amazing. Almost monthly we see reports announcing new breakthroughs in different technological aspects of AI.

As an organization focussing on research and development, we can look back on an increasing number of publications and awards.

Publications

We aim to push the state-of-the-art for problems such as automatic text recognition (ATR), language modeling (LM), named entity recognition (NER), visual question answering (VQA) and image segmentation (IS) even beyond human performance.

Our team of experienced AI researchers is working with and improving techniques such as:

  • fully convolutional neural networks
  • attention-based recurrent free models as well as in combination with recurrent models
  • graph neural networks
  • neural memory techniques
  • unsupervised and self-supervised pre-training strategies
  • improved learning strategies
In order to apply Optical Character Recognition (OCR) to historical printings of Latin script fully automatically, we report on our efforts to construct a widely-applicable polyfont recognition model yielding text with a Character Error Rate (CER) around 2% when applied out-of-the-box. Moreover, we show how this model can be further finetuned to specific classes of printings with little manual and computational effort. The mixed or polyfont model is trained on a wide variety of materials, in terms of age (from the 15th to the 19th century), typography (various types of Fraktur and Antiqua), and languages (among others, German, Latin, and French). To optimize the results we combined established techniques of OCR training like pretraining, data augmentation, and voting. In addition, we used various preprocessing methods to enrich the training data and obtain more robust models. We also implemented a two-stage approach which first trains on all available, considerably unbalanced data and then refines the output by training on a selected more balanced subset. Evaluations on 29 previously unseen books resulted in a CER of 1.73%, outperforming a widely used standard model with a CER of 2.84% by almost 40%. Training a more specialized model for some unseen Early Modern Latin books starting from our mixed model led to a CER of 1.47%, an improvement of up to 50% compared to training from scratch and up to 30% compared to training from the aforementioned standard model. Our new mixed model is made openly available to the community.

Authors: Christian Reul (University of Würzburg), Christoph Wick (PLANET AI GmbH), Maximilian Nöth, Andreas Büttner, Maximilian Wehner (all University of Würzburg), Uwe Springmann (LMU München)

Series: ICDAR 2021

Pages: 112 – 126

DOI: 10.1007/978-3-030-86334-0_8

Read the article
Most recently, Transformers – which are recurrent-free neural network architectures – achieved tremendous performances on various Natural Language Processing (NLP) tasks. Since Transformers represent a traditional Sequence-To-Sequence (S2S)-approach they can be used for several different tasks such as Handwritten Text Recognition (HTR). In this paper, we propose a bidirectional Transformer architecture for line-based HTR that is composed of a Convolutional Neural Network (CNN) for feature extraction and a Transformer-based encoder/decoder, whereby the decoding is performed in reading-order direction and reversed. A voter combines the two predicted sequences to obtain a single result. Our network performed worse compared to a traditional Connectionist Temporal Classification (CTC) approach on the IAM-dataset but reduced the state-of-the-art of Transformers-based approaches by about 25% without using additional data. On a signi cantly larger dataset, the proposed Transformer significantly outperformed our reference model by about 26%. In an error analysis, we show that the Transformer is able to learn a strong language model which explains why a larger training dataset is required to outperform traditional approaches and discuss why Transformers should be used with caution for HTR due to several shortcomings such as repetitions in the text.

Authors: Christoph Wick (PLANET AI GmbH), Jochen Zöllner (PLANET AI GmbH, University of Rostock), Tobias Grüning (PLANET AI GmbH)

Series: ICDAR 2021

Pages: 112 – 126

Read the article

In this paper, we propose a novel method for Automatic Text Recognition (ATR) on early printed books. Our approach significantly reduces the Character Error Rates (CERs) for book-specific training when only a few lines of Ground Truth (GT) are available and considerably outperforms previous methods. An ensemble of models is trained simultaneously by optimising each one independently but also with respect to a fused output obtained by averaging the individual confidence matrices. Various experiments on five early printed books show that this approach already outperforms the current state-of-the-art by up to 20% and 10% on average. Replacing the averaging of the confidence matrices during prediction with a con dence-based voting boosts our results by an additional 8% leading to a total average improvement of about 17%.

Authors: Christoph Wick (PLANET AI GmbH), Christian Reul (University of Würzburg)

Series: ICDAR 2021

Pages: 385 – 399

DOI: 10.1007/978-3-030-86549-8_25

Read the article
tfaip is a Python-based research framework for developing, structuring, and deploying DeepLearning projects powered by Tensorflow (Abadi et al., 2015) and is intended for scientists of universities or organizations who research, develop, and optionally deploy Deep Learning models. tfaip enables both simple and complex implementation scenarios, such as image classification, object detection, text recognition, natural language processing, or speech recognition. Each scenario is highly configurable by parameters that can directly be modified by the command line or the API.

Authors: Christoph Wick, Benjamin Kühn, Gundram Leifert (all PLANET AI GmbH), Konrad Sperfeld (CITlab, University of Rostock), Jochen Zöllner (PLANET AI GmbH, University of Rostock), Tobias Grüning (PLANET AI GmbH)

Journal: The Journal of Open Source Software (JOSS)

DOI: 10.21105/joss.03297

Read the article

Automated text recognition is a fundamental problem in Document Image Analysis. Optical models are used for modeling characters while language models are used for composing sentences. Since the scripts and linguistic context differ widely, it is mandatory to specialize the models by training on task-dependent ground-truth. However, to create a sufficient amount of ground-truth, at least for historical handwritten scripts, well-qualified persons have to mark and transcribe text lines, which is very time-consuming. On the other hand, in many cases unassigned transcripts are already available on page-level from another process chain, or at least transcripts from similar linguistic context are available. In this work we present two approaches that make use of such transcripts: whereas the first one creates training data by automatically assigning page-dependent transcripts to text lines, the second one uses a task-specific language model to generate highly confident training data. Both approaches are successfully applied on a very challenging historical handwritten collection.

Authors: Gundram Leifert (PLANET AI GmbH), Joan Andreu Sànchez (Pattern Recognition and Human Language Technologies Center), Roger Labahn (Computational Intelligence Technology Lab)

Series: ICFHR ’20

Pages: To appear

Note: This work was partially funded by the Generalitat Valenciana under the EU-FEDER Comunitat Valenciana 2014-2020 grant IDIFEDER/2018/025 “Sistemas de fabricación inteligente para la indústria 4.0”. | in proceeding

Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end, we propose an attention-based sequence-to-sequence model. It combines a convolutional neural network as a generic feature extractor with a recurrent neural network to encode both the visual information, as well as the temporal context between characters in the input image, and uses a separate recurrent neural network to decode the actual character sequence. We make experimental comparisons between various attention mechanisms and positional encodings, in order to find an appropriate alignment between the input and output sequence. The model can be trained end-to-end and the optional integration of a hybrid loss allows the encoder to retain an interpretable and usable output, if desired. We achieve competitive results on the IAM and ICFHR2016 READ data sets compared to the state-of-the-art without the use of a language model, and we significantly improve over any recent sequence-to-sequence approaches.

Authors: Johannes Michael, Roger Labahn, Tobias Grüning, Jochen Zöllner

Booktitle: Proceedings of the 2019 15th International Conference on Document Analysis and Recognition

Series: ICDAR ’19

Pages: To appear

Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding

Read the article

Measuring the performance of text recognition and text line detection engines is an important step to objectively compare systems and their configuration. There exist well-established measures for both tasks separately. However, there is no sophisticated evaluation scheme to measure the quality of a combined text line detection and text recognition system. The F-measure on word level is a well-known methodology, which is sometimes used in this context. Nevertheless, it does not take into account the alignment of hypothesis and ground truth text and can lead to deceptive results. Since users of automatic information retrieval pipelines in the context of text recognition are mainly interested in the end-to-end performance of a given system, there is a strong need for such a measure. Hence, we present a measure to evaluate the quality of an end-to-end text recognition system. The basis for this measure is the well established and widely used character error rate, which is limited — in its original form — to aligned hypothesis and ground truth texts. The proposed measure is flexible in a way that it can be configured to penalize different reading orders between the hypothesis and ground truth and can take into account the geometric position of the text lines. Additionally, it can ignore over- and under- segmentation of text lines. With these parameters it is possible to get a measure fitting best to its own needs.

Authors: Gundram Leifert, Roger Labahn, Tobias Grüning, Svenja Leifert

Booktitle: Proceedings of the 2019 15th International Conference on Document Analysis and Recognition

Series: ICDAR ’19

Pages: To appear

Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding

Read the article

We present a recognition and retrieval system for the ICDAR2017 Competition on Information Extraction in Historical Handwritten Records which successfully infers person names and other data from marriage records. The system extracts information from the line images with a high accuracy and outperforms the baseline. The optical model is based on Neural Networks. To infer the desired information, regular expressions are used to describe the set of feasible words sequences.

Authors: Tobias Strauß, Max Weidemann, Johannes Michael, Gundram Leifert, Tobias Grüning, Roger Labahn

Journal: CoRR

Volume: abs/1804.09943

Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ)

Read the article

Accessibility of the valuable cultural heritage which is hidden in countless scanned historical documents is the motivation for the presented dissertation. The developed (fully automatic) text line extraction methodology combines state-of-the-art machine learning techniques and modern image processing methods. It demonstrates its quality by outperforming several other approaches on a couple of benchmarking datasets. The method is already being used by a wide audience of researchers from different disciplines and thus contributes its (small) part to the aforementioned goal.

Author: Tobias Grüning

Type: PhD thesis

School: Universität Rostock

Read the thesis

Author: Tobias Grüning, Roger Labahn, Markus Diem, Florian Kleber, Stefan Fiel

Booktitle: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS)

Pages: 351-356

Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding

DOI: 10.1109/DAS.2018.38

Authors: Tobias Grüning, Gundram Leifert, Tobias Strauß, Roger Labahn

Booktitle: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)

Volume: 01

Pages: 351-356

Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | inproceeding

DOI: 10.1109/ICDAR.2017.47

In der Handschrifterkennung geben neuronale Netze Folgen von Wahrscheinlichkeit pro Character aus. Gegenstand der Arbeit ist das Optimierungsproblem, die Ausgaben neuronaler Netzwerke in Maschinen-lesbare Texte zu konvertieren. Dies wird mit Hilfe von gewichteten Automaten realisiert. Als wesentliches Resultat wird eine effiziente Heuristik entwickelt, die die wahrscheinlichste Buchstabenfolge aller durch reguläre Ausdrücke beschränkter Folgen findet.

Author: Tobias Strauß

Type: PhD thesis

School: Universität Rostock

Read the thesis

Authors: Tobias Grüning, Gundram Leifert, Tobias Strauß, Roger Labahn

Booktitle: CLEF2016 Working Notes

Series: CEUR Workshop Proceedings

Publisher: CEUR-WS.org

Pages: 351-356

Note: Partially funded by grant no. KF2622304SS3 (Kooperationsprojekt) in Zentrales Innovationsprogramm Mittelstand (ZIM) by Bundesrepublik Deutschland (BMWi) and the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ) | in proceeding

DOI: 10.1109/ICDAR.2017.47

The transcription of handwritten text on images is one task in machine learning and one solution to solve it is using multi- dimensional recurrent neural networks (MDRNN) with connectionist temporal classification (CTC). The RNNs can contain special units, the long short-term memory (LSTM) cells. They are able to learn long term dependencies but they get unstable when the dimension is chosen greater than one. We defined some useful and necessary properties for the one-dimensional LSTM cell and extend them in the multi-dimensional case. Thereby we introduce several new cells with better stability. We present a method to design cells using the theory of linear shift invariant systems. The new cells are compared to the LSTM cell on the IFN/ENIT and Rimes database, where we can improve the recognition rate compared to the LSTM cell. So each application where the LSTM cells in MDRNNs are used could be improved by substituting them by the new developed cells.

Authors: Gundram Leifert, Tobias Strauß, Tobias Grüning, Welf Wustlich, Roger Labahn

Journal: Journal of Machine Learning Research

Volume: 17

Number: 97

Pages: 1-37

Read the article

This article proposes a convenient tool for decoding the output of neural networks trained by Connectionist Temporal Classification (CTC) for handwritten text recognition. We use regular expressions to describe the complex structures expected in the writing. The corresponding finite automata are employed to build a decoder. We analyze theoretically which calculations are relevant and which can be avoided. A great speed-up results from an approximation. We conclude that the approximation most likely fails if the regular expression does not match the ground truth which is not harmful for many applications since the low probability will be even underestimated. The proposed decoder is very efficient compared to other decoding methods. The variety of applications reaches from information retrieval to full text recognition. We refer to applications where we integrated the proposed decoder successfully.

Authors: Tobias Strauß, Gundram Leifert, Tobias Grüning, Roger Labahn

Journal: Neural Networks

Volume: 79

Pages: 1 – 11

Note: Partially funded by grant no. KF2622304SS3 (Kooperationsprojekt) in Zentrales Innovationsprogramm Mittelstand (ZIM) by Bundesrepublik Deutschland (BMWi)

Read the article
We describe CITlab’s recognition system for the HTRtS competition attached to the 13. International Conference on Document Analysis and Recognition, ICDAR 2015. The task comprises the recognition of historical handwritten documents. The core algorithms of our system are based on multi-dimensional recurrent neural networks (MDRNN) and connectionist temporal classification (CTC). The software modules behind that as well as the basic utility technologies are essentially powered by PLANET’s ARGUS framework for intelligent text recognition and image processing.

Authors: Gundram Leifert, Tobias Strauß, Tobias Grüning, Roger Labahn

Journal: CoRR

Volume: abs/1605.08412

Note: Partially funded by the European Unions Horizon 2020 research and innovation programme under grant agreement No 674943 (READ)

Read the article
We describe CITlab’s recognition system for the HTRtS competition attached to the 14. International Conference on Frontiers in Handwriting Recognition, ICFHR 2014. The task comprises the recognition of historical handwritten documents. The core algorithms of our system are based on multi-dimensional recurrent neural networks (MDRNN) and connectionist temporal classification (CTC). The software modules behind that as well as the basic utility technologies are essentially powered by PLANET’s ARGUS framework for intelligent text recognition and image processing.

Authors: Tobias Strauß, Tobias Grüning, Gundram Leifert, Roger Labahn

Journal: CoRR

Volume: abs/1412.3949

Note: Partially funded by research grant no. V220-630-08-TFMV-S/F-059 (Verbundvorhaben, Technologieförderung Land Mecklenburg-Vorpommern) in European Social / Regional Development Funds

Read the article

We describe CITlab’s recognition system for the ANWRESH-2014 competition attached to the 14. International Conference on Frontiers in Handwriting Recognition, ICFHR 2014. The task comprises word recognition from segmented historical documents. The core components of our system are based on multi-dimensional recurrent neural networks (MDRNN) and connectionist temporal classification (CTC). The software modules behind that as well as the basic utility technologies are essentially powered by PLANET’s ARGUS framework for intelligent text recognition and image processing.

Authors: Tobias Strauß, Tobias Grüning, Gundram Leifert, Roger Labahn

Journal: CoRR

Volume: abs/1412.6012

Note: Partially funded by research grant no. V220-630-08-TFMV-S/F-059 (Verbundvorhaben, Technologieförderung Land Mecklenburg-Vorpommern) in European Social / Regional Development Funds

Read the article

In the recent years it turned out that multidimensional recurrent neural networks (MDRNN) perform very well for offline handwriting recognition tasks like the OpenHaRT 2013 evaluation DIR. With suitable writing preprocessing and dictionary lookup, our ARGUS software completed this task with an error rate of 26.27% in its primary setup.

Authors: Tobias Strauß, Tobias Grüning, Gundram Leifert, Roger Labahn

Journal: CoRR

Volume: abs/1412.6061

Note: Partially funded by research grant no. V220-630-08-TFMV-S/F-059 (Verbundvorhaben, Technologieförderung Land Mecklenburg-Vorpommern) in European Social / Regional Development Funds

Read the article

This article develops approaches to generate dynamical reservoirs of echo state networks with desired properties reducing the amount of randomness. It is possible to create weight matrices with a predefined singular value spectrum. The procedure guarantees stability (echo state property). We prove the minimization of the impact of noise on the training process. The resulting reservoir types are strongly related to reservoirs already known in the literature. Our experiments show that well-chosen input weights can improve performance.

Authors: Tobias Strauß, Welf Wustlich, Roger Labahn

Journal: Neural Computation

Volume: 24

Number: 12

Pages: 3246-3276

Note: Partially funded by the research grant no. V220-630-08-TFMV-S/F-059 (Verbundvorhaben, Technologieförderung Land Mecklenburg-Vorpommern) in European Social / Regional Development Funds

Read the article

Awards

PLANET AI creates world-leading solutions mastering all components of state-of-the-art document analysis, with a strong focus on research and development.

Find out how their patented technology proved their abilities at the most renowned international competitions.

Download Datasheet

Research Partners

Screening all relevant international research, extracting the essence for PlanetBrain and at the same time realizing our own ambitious research projects would never be possible without these highly qualified and committed teams. It always feels like an adventure trip for all of us when we jointly organize workshops twice a year, meet at international congresses or simply cooperate on certain challenging tasks.

Additionally, we have been co-funded by the European Union for several years.