Blog

AI Revolution in Data Extraction: Structuring Emails, PDFs, and Videos

Let us explore some of the real-world use cases, emerging from this conversation, showing how generative AI is redefining what's possible.

Klaudia Wachnio

Klaudia Wachnio

March 1, 2026
7 minutes

For years, the promise of artificial intelligence in business has been appealing, but sadly out of reach for many. Visions of fully automated systems ran into the hard reality of brittle rules, inconsistent data formats, and the immense cost of training custom models. Data extraction, i.e the critical task of pulling structured information from unstructured sources like emails, documents, and images, seemed to be a labor-intensive bottleneck. 

In our latest video, "AI in Practice: Claims Handling, Video Labeling & Scalable AI Use Cases", we observe how a fundamental shift is underway. The advent of powerful, multimodal large language models (LLMs) is turning data extraction from a constrained, technical challenge into a flexible and revolutionary capability. We’re swiftly moving from a world of manual entry and rigid templates to one where AI can understand context, interpret layout, and reason about content. This way, we can transform chaos into clarity. 

Let us explore some of the real-world use cases, emerging from this conversation, showing how generative AI is redefining what's possible. 

The old frontier: rules, templates, and tedium 

Traditionally, extracting data meant going down one of two paths. For high-volume, predictable documents, companies could invest in complex rule-based systems or optical character recognition (OCR) tuned to specific form layouts. For everything else, that is the often-unpredictable narrative-driven content, the solution was simple: humans working their fingers to the bone. Like an insurance employee manually reading through a customer’s lengthy accident description email, or an analyst combing through a 300-page financial report to find a specific metric. 

As Jeroen notes, there were some techniques, such as Named Entity Recognition (NER), but they often required custom training on domain-specific datasets, a process that was time-consuming, expensive, and narrow in scope. The model built to find client names in insurance claims couldn’t help the human sitting down at a computer to parse a supplier invoice. This situation for several decades. 

The new paradigm: prompt as configuration, LLM as interpreter

The core change with generative AI, Jeroen explains, is that "we don't pre-train... we can consume [the model] as an API." The domain knowledge is no longer baked into the model’s weights through lengthy training; instead, it’s directly injected in the prompt. This means a single, powerful model can be adapted through instruction to extract occupation, accident type, and severity from an insurance email. That same model, with a different prompt, can then extract financial KPIs from an annual report. 

Of course, this is not about magic. "You will need to take into account setting up proper evaluation pipeline[s]," cautions Jeroen. But the development time is drastically shorter, the flexibility is greater, and the quality of extraction can surpass older methods, especially if we are facing complex, narrative data. 

Let’s look at some practical use cases.

Use case #1 - transforming insurance claims with emails and vector search

The team worked with an insurance company to automate part of the claims process. Previously, an employee would receive a long, unstructured email from a customer describing an accident. They would then manually read it, understand all the details, and painstakingly search a database for similar historical claims to determine what would be a fair payout. 

The new AI-driven process is two-fold, showcasing both extraction and enhancement: 

  1. Structured extraction: An LLM reads the customer’s email narrative and extracts key structured fields such as the customer’s occupation, age, type of accident, and severity (e.g., light, medium, heavy). This transforms a block of text into a queryable data point. 
  1. Intelligent retrieval: This structured data is then used to filter and find the most similar past claims in the database. Furthermore, the entire email text is converted into a vector embedding to perform a semantic similarity search, finding cases that are contextually alike, not just keyword matches. 

The result isn't full automation but rather all about massive acceleration. The human adjuster is now assisted, presented with structured data and similar cases instantly, allowing them to focus on final judgment and complex exceptions. 

Use case #2: auditing massive documents with multimodal understanding

The second use case pushes into true multimodal territory. A client needed to audit long, complex annual reports, several thousand pages, to answer specific questions about performance. 

Here’s the critical insight: when a model like Google’s Gemini processes a PDF, it often treats each page as an image. This unlocks a profound capability. As Jeroen points out, "since PDFs have no predictable structure, they might as well be treated like an image." Modern multimodal LLMs have exceptional OCR and layout understanding baked in, allowing them to comprehend tables, charts, and flowing text across diverse document formats. 

Here the AI acts as a supercharged research assistant. Give it the 3,000-page PDF and a list of audit questions. It will scan the entire document and provide draft answers. Crucially, it also provides citations, pointing to the exact page or section where the information was found. "This is an example where we do not leave all the responsibility to the LLM," Jeroen emphasizes. The human auditor gets a complete, time-saving draft with verifiable sources, moving the process from manual searching to intelligent validation. 

Use case #3: deconstructing video content with AI

The most complex example reveals how generative AI orchestrates multiple techniques. For a media company (AVRO), the goal was to automatically analyze talk show episodes to understand which topics and speakers drove engagement and attracted viewers. 

This wasn't a single LLM call. It was a whole pipeline: 

  1. Audio to text: The video’s audio was transcribed with timestamps. 
  1. Chapter creation: An LLM analyzed the transcript to create logical chapters, segmenting the show into distinct topics with start/end times. 
  1. Visual analysis: A traditional, specialized face detection model identified every person in every frame. Clever logic (like measuring the proportion of the screen a face occupied) filtered out background people to focus on main speakers. 
  1. Face clustering: Embeddings of detected faces were grouped to identify unique individuals across the episode. 
  1. Synthesis: All of this data was combined with viewership logs in a dashboard. The result? The media company could see, for example, that a segment on renewable energy featuring a particular guest spiked viewer interest at the 22-minute mark. 

This use case is a masterclass in pragmatic AI: each job has the right tool. It combines the narrative understanding of an LLM for chapters, the precision of a dedicated computer vision model for faces, and the analytical power of embeddings for clustering, all orchestrated into a single, automated workflow that runs on a schedule.

The path forward: assisted, autopilot, or autonomous?

Throughout these examples, a consistent theme emerges from the conversation in the video: the importance of the "human in the loop." The industry has had a "reality check around hallucinations," and smart implementation builds in checks and balances. 

There is a pragmatic framework of four automation levels in discussion: 

  • Level 0: Manual (Human does everything) 
  • Level 1: Assisted (Human in control, AI helps) 
  • Level 2: Autopilot (AI in control, human supervises) 
  • Level 3: Autonomous (Fully automated) 

Most of the real value today, as seen in these use cases, comes from moving from Manual to Assisted or Autopilot. The leap to full autonomy often requires disproportionate effort and yields diminishing returns. The goal is to be pragmatic: use generative AI to shoulder the bulk of the cognitive load (reading, extracting, synthesizing), while leaving critical validation, complex judgment, and final approval in human hands. This builds trust, ensures quality, and delivers immense time savings without betting the business on flawless AI. 

The era of generative AI is not about replacing humans with robots. It's about giving professionals a powerful new lens to see structure in the unstructured, to find signal in the noise, and to automate the tedious so they can focus on what is meaningful. From emails to video feeds, data extraction is just the beginning. 

Ready to see these concepts in action and hear the full, detailed discussion? Watch the complete video here, for an in-depth conversation about AI use cases where we break down these implementations, discuss guardrails and evaluation, and explore the future of AI in production. 

Written by

Klaudia Wachnio

Contact

Let’s discuss how we can support your journey.