ASCEND – Development of a new type of Transformer for the specific image preparation of scanned documents
The aim of the “ASCEND” project is to develop a new method for improving the automatic processing of scanned documents by preparing the image information. The technical innovation and solution is the development of a novel transformer AI system and the transfer and application of methodologies for (parameter) efficient training from different contexts such as speech and text processing to vision transformer models. The second innovative core of the project is to develop a new type of loss function that also includes the result of text recognition in the evaluation of image processing.
Project partner: regrapes GmbH , Köln
Funding: BMWK 02/2024 – 01/2026
Project team :Prof. Dr. Michael Munz . Focus: project lead, machine learning / AIResearch assistants / doctoral students: David Kreuzer, M.Sc.
News
Project funding “ASCEND” by Michael Munz
22. March 2023
We are happy to get funded for the project “ASCEND”, were we will develop a new type of Transformer for the specific image preparation of scanned documents. More details can be found under our project website ASCEND .
Related Publications
15935456
ASCEND
1
apa
50
date
desc
1
1
title
182
https://www.aisd-ulm.de/wp-content/plugins/zotpress/
%7B%22status%22%3A%22success%22%2C%22updateneeded%22%3Afalse%2C%22instance%22%3Afalse%2C%22meta%22%3A%7B%22request_last%22%3A0%2C%22request_next%22%3A0%2C%22used_cache%22%3Atrue%7D%2C%22data%22%3A%5B%7B%22key%22%3A%22473FZMB9%22%2C%22library%22%3A%7B%22id%22%3A15935456%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Kreuzer%20and%20Munz%22%2C%22parsedDate%22%3A%222023-06-05%22%2C%22numChildren%22%3A1%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%202%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EKreuzer%2C%20D.%2C%20%26%20Munz%2C%20M.%20%282023%29.%20%3Ci%3E%3Ca%20class%3D%27zp-ItemURL%27%20target%3D%27_blank%27%20href%3D%27http%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2306.02815%27%3ETransformer-Based%20UNet%20with%20Multi-Headed%20Cross-Attention%20Skip%20Connections%20to%20Eliminate%20Artifacts%20in%20Scanned%20Documents%3C%5C%2Fa%3E%3C%5C%2Fi%3E%20%28No.%20arXiv%3A2306.02815%29.%20arXiv.%20https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.48550%5C%2FarXiv.2306.02815%20%3Ca%20title%3D%27Cite%20in%20RIS%20Format%27%20class%3D%27zp-CiteRIS%27%20href%3D%27https%3A%5C%2F%5C%2Fwww.aisd-ulm.de%5C%2Fwp-content%5C%2Fplugins%5C%2Fzotpress%5C%2Flib%5C%2Frequest%5C%2Frequest.cite.php%3Fapi_user_id%3D15935456%26amp%3Bitem_key%3D473FZMB9%27%3ECite%3C%5C%2Fa%3E%20%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22preprint%22%2C%22title%22%3A%22Transformer-Based%20UNet%20with%20Multi-Headed%20Cross-Attention%20Skip%20Connections%20to%20Eliminate%20Artifacts%20in%20Scanned%20Documents%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22David%22%2C%22lastName%22%3A%22Kreuzer%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Michael%22%2C%22lastName%22%3A%22Munz%22%7D%5D%2C%22abstractNote%22%3A%22The%20extraction%20of%20text%20in%20high%20quality%20is%20essential%20for%20text-based%20document%20analysis%20tasks%20like%20Document%20Classification%20or%20Named%20Entity%20Recognition.%20Unfortunately%2C%20this%20is%20not%20always%20ensured%2C%20as%20poor%20scan%20quality%20and%20the%20resulting%20artifacts%20lead%20to%20errors%20in%20the%20Optical%20Character%20Recognition%20%28OCR%29%20process.%20Current%20approaches%20using%20Convolutional%20Neural%20Networks%20show%20promising%20results%20for%20background%20removal%20tasks%20but%20fail%20correcting%20artifacts%20like%20pixelation%20or%20compression%20errors.%20For%20general%20images%2C%20Transformer%20backbones%20are%20getting%20integrated%20more%20frequently%20in%20well-known%20neural%20network%20structures%20for%20denoising%20tasks.%20In%20this%20work%2C%20a%20modified%20UNet%20structure%20using%20a%20Swin%20Transformer%20backbone%20is%20presented%20to%20remove%20typical%20artifacts%20in%20scanned%20documents.%20Multi-headed%20cross-attention%20skip%20connections%20are%20used%20to%20more%20selectively%20learn%20features%20in%20respective%20levels%20of%20abstraction.%20The%20performance%20of%20this%20approach%20is%20examined%20regarding%20compression%20errors%2C%20pixelation%20and%20random%20noise.%20An%20improvement%20in%20text%20extraction%20quality%20with%20a%20reduced%20error%20rate%20of%20up%20to%2053.9%25%20on%20the%20synthetic%20data%20is%20archived.%20The%20pretrained%20base-model%20can%20be%20easily%20adapted%20to%20new%20artifacts.%20The%20cross-attention%20skip%20connections%20allow%20to%20integrate%20textual%20information%20extracted%20from%20the%20encoder%20or%20in%20form%20of%20commands%20to%20more%20selectively%20control%20the%20models%20outcome.%20The%20latter%20is%20shown%20by%20means%20of%20an%20example%20application.%22%2C%22genre%22%3A%22%22%2C%22repository%22%3A%22arXiv%22%2C%22archiveID%22%3A%22arXiv%3A2306.02815%22%2C%22date%22%3A%222023-06-05%22%2C%22DOI%22%3A%2210.48550%5C%2FarXiv.2306.02815%22%2C%22citationKey%22%3A%22%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Farxiv.org%5C%2Fabs%5C%2F2306.02815%22%2C%22language%22%3A%22%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222024-12-18T14%3A42%3A04Z%22%7D%7D%5D%7D