Recent progress in the areas of Artificial Intelligence (AI) and Machine Learning (ML) are tremendous and amazing. Almost monthly we see reports announcing breakthroughs in different technological aspects of AI.

As an organization focussing on research and development, we can look back on an increasing number of publications and awards.


We aim to push the state-of-the-art for problems such as automatic text recognition (ATR), language modeling (LM), named entity recognition (NER), visual question answering (VQA) and image segmentation (IS) even beyond human performance.

Our team of experienced AI researchers is working with and improving techniques such as:

  • fully convolutional neural networks
  • attention-based recurrent free models as well as in combination with recurrent models
  • graph neural networks
  • neural memory techniques
  • unsupervised and self-supervised pre-training strategies
  • improved learning strategies

In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of both approaches, we propose to use the CTC-Prefix-Score during S2S decoding. Hereby, during beam search, paths that are invalid according to the CTC confidence matrix are penalised. Our network architecture is composed of a Convolutional Neural Network (CNN) as visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) as encoder, and a decoder which is a Transformer with inserted mutual attention layers. The CTC confidences are computed on the encoder while the Transformer is only used for character-wise S2S decoding. We evaluate this setup on three HTR data sets: IAM, Rimes, and StAZH. On IAM, we achieve a competitive Character Error Rate (CER) of 2.95% when pretraining our model on synthetic data and including a character-based language model for contemporary English. Compared to other state-of-the-art approaches, our model requires about 10–20 times less parameters. Access our shared implementations via this link to GitHub.

Authors: Christoph Wick (PLANET AI GmbH), Jochen Zöllner (PLANET AI GmbH, University of Rostock), Tobias Grüning (PLANET AI GmbH)

Series: DAS 2022 – 15th IAPR International Workshop on Document Analysis Systems

DOI: 10.1007/978-3-031-06555-2_18

Read the article
In order to apply Optical Character Recognition (OCR) to historical printings of Latin script fully automatically, we report on our efforts to construct a widely-applicable polyfont recognition model yielding text with a Character Error Rate (CER) around 2% when applied out-of-the-box. Moreover, we show how this model can be further finetuned to specific classes of printings with little manual and computational effort. The mixed or polyfont model is trained on a wide variety of materials, in terms of age (from the 15th to the 19th century), typography (various types of Fraktur and Antiqua), and languages (among others, German, Latin, and French). To optimize the results we combined established techniques of OCR trai