Forschung

Forschungsprojekt DOKIQ

Das KI-Forschungsprojekt DOKIQ hat das Ziel, den Einsatz von KI in der automatisierten Erkennung von Dokumentfälschungen zu erforschen und in die Polizeiarbeit zu integrieren. Projektpartner sind neben der HdM Stuttgart und dem LKA BW, die Bundesdruckerei Berlin (BDR) sowie die Landeskriminalämter Bayern und Hessen.

Wissenschaftliche Kompetenz

Unsere enge Zusammenarbeit mit akademischen Partnern ermöglicht es uns, Lösungen zu entwickeln, die auf dem neuesten Stand der Forschung basieren. Gleichzeitig tragen unsere Wissenschaftler mit ihren Publikationen aktiv zur Weiterentwicklung des Fachgebiets bei und pflegen ein starkes Netzwerk mit führenden Experten.

Mit einem starken wissenschaftlichen Hintergrund unserer Mitarbeiter und promovierenden KI-Entwicklern verbinden wir Forschungstiefe optimal mit Industrieerfahrung.

Ursprung

Die Hochschule der Medien Stuttgart steht für hohe wissenschaftliche Kompetenz an der Schnittstelle von Medien, Informatik und Technologie.

Sie verbindet fundierte Forschung mit praxisnaher Lehre und ist insbesondere in den Bereichen Digitalisierung, Künstliche Intelligenz und innovative Medientechnologien national wie international anerkannt.

Forschungspartnerschaft mit Unternehmen

Wir geben auch Unternehmen die Möglichkeit, gemeinsam mit uns als Forschungspartner innovative KI-Anwendungen zu identifizieren, zu evaluieren und praxisnah umzusetzen. Gemeinsam können wir maßgeschneiderte, wirkungsvolle Lösungen für Ihre unternehmensspezifischen Herausforderungen entwickeln. Wenn Sie Interesse an einer partnerschaftlichen Zusammenarbeit mit uns haben, melden Sie sich. gerne bei uns!

Kontakt

Veröffentlichungen unserer Wissenschaftler

Evaluating the Impact of Data Anonymization on Image Retrieval

Marvin Chen, Manuel Eberhardinger, Johannes Maucher

With the growing importance of privacy regulations such as the General Data Protection Regulation, anonymizing visual data is becoming increasingly relevant across institutions. However, anonymization can negatively affect the performance of Computer Vision systems that rely on visual features, such as Content-Based Image Retrieval (CBIR). Despite this, the impact of anonymization on CBIR has not been systematically studied. This work addresses this gap, motivated by the DOKIQ project, an artificial intelligence-based system for document verification actively used by the State Criminal Police Office Baden-Württemberg. We propose a simple evaluation framework: retrieval results after anonymization should match those obtained before anonymization as closely as possible. To this end, we systematically assess the impact of anonymization using two public datasets and the internal DOKIQ dataset. Our experiments span three anonymization methods, four anonymization degrees, and four training strategies, all based on the state of the art backbone Self-Distillation with No Labels (DINO)v2. Our results reveal a pronounced retrieval bias in favor of models trained on original data, which produce the most similar retrievals after anonymization. The findings of this paper offer practical insights for developing privacy-compliant CBIR systems while preserving performance.

Mehr erfahren

Classification of Inkjet Printers based on Droplet Statistics

Patrick Takenaka, Manuel Eberhardinger, Daniel Grießhaber, Johannes Maucher

Knowing the printer model used to print a given document may provide a crucial lead towards identifying counterfeits or conversely verifying the validity of a real document. Inkjet printers produce probabilistic droplet patterns that appear to be distinct for each printer model and as such we investigate the utilization of droplet characteristics including frequency domain features extracted from printed document scans for the classification of the underlying printer model. We collect and publish a dataset of high resolution document scans and show that our extracted features are informative enough to enable a neural network to distinguish not only the printer manufacturer, but also individual printer models.

Mehr erfahren

Anonomyzation of Documents for Law Enforcement with Machine Learning

Manuel Eberhardinger, Patrick Takenaka, Daniel Grießhaber, Johannes Maucher

The steadily increasing utilization of data-driven methods and approaches in areas that handle sensitive personal information such as in law enforcement mandates an ever increasing effort in these institutions to comply with data protection guidelines. In this work, we present a system for automatically anonymizing images of scanned documents, reducing manual effort while ensuring data protection compliance. Our method considers the viability of further forensic processing after anonymization by minimizing automatically redacted areas by combining automatic detection of sensitive regions with knowledge from a manually anonymized reference document. Using a self-supervised image model for instance retrieval of the reference document, our approach requires only one anonymized example to efficiently redact all documents of the same type, significantly reducing processing time. We show that our approach outperforms both a purely automatic redaction system and also a naive copy-paste scheme of the reference anonymization to other documents on a hand-crafted dataset of ground truth redactions.

Mehr erfahren

Generation of Programmatic Rules for Document Forgery Detection Using Large Language Models

Valentin Schmidberger, Manuel Eberhardinger, Setareh Maghsudi, Johannes Maucher

Document forgery poses a growing threat to legal, economic, and governmental processes, requiring increasingly sophisticated verification mechanisms. One approach involves the use of plausibility checks, rule-based procedures that assess the correctness and internal consistency of data, to detect anomalies or signs of manipulation. Although these verification procedures are essential for ensuring data integrity, existing plausibility checks are manually implemented by software engineers, which is time-consuming. Recent advances in code generation with large language models (LLMs) offer new potential for automating and scaling the generation of these checks. However, adapting LLMs to the specific requirements of an unknown domain remains a significant challenge. This work investigates the extent to which LLMs, adapted on domain-specific code and data through different fine-tuning strategies, can generate rule-based plausibility checks for forgery detection on constrained hardware resources. We fine-tune open-source LLMs, Llama 3.1 8B and OpenCoder 8B, on structured datasets derived from real-world application scenarios and evaluate the generated plausibility checks on previously unseen forgery patterns. The results demonstrate that the models are capable of generating executable and effective verification procedures. This also highlights the potential of LLMs as scalable tools to support human decision-making in security-sensitive contexts where comprehensibility is required.

Mehr erfahren