PLANET AI’s partner IBM presented the challenge of providing a government agency with an automated solution for data extraction. Their client struggled with handling a rising amount of forms due to the COVID-19 pandemic. These forms are processed as scanned PDFs and contain handwriting, bad-quality typescript as well as tables. Since the documents are strictly confidential, no authentic data was given to Planet AI beforehand.
PLANET AI’s Solution:
PLANET AI provided the agency with three features of their IDA Suite:
- The Textlayer feature generates a go-head for further processing by making the scanned content readable and searchable.
- Based on this, IDA Classification only needs a small amount of training data to classify entire documents or single pages, which revolutionized sorting the agency’s forms.
- Additionally, the IDA Extraction feature allows extracting specific key-value pairs, such as names, addresses and other relevant values, building on the preceding document classification.
The provided IDA features achieved an outstanding accuracy: 95 percent of all desired fields could be found, read and extracted correctly – with a significant amount of handwritten text. Hence, the agency could save thousands of working hours per year while speeding up the input process significantly. IDA could help to process 40 million pages in a maximum of 60 to 90 days.
Classification and extraction do not rely on a manual rulebook anymore, whereby the potential for mistakes could be reduced dramatically. PLANET AI’s project study was built entirely on synthetic data and is therefore highly adaptive. Additionally, on-premise deployment guarantees data security.