Blog

Beyond RAG: AI Agents with a Real-Time Context

Marek Maj

October 28, 2025

9 minutes

We talk a lot about the benefits of using AI in customer service. It can be a true game-changer helping to drive revenue growth while reducing operational costs. However, it’s important to remember that AI isn’t a magic wand that will solve all problems without right implementation.

Picture this common scene in e-commerce customer service:

A loyal customer, annoyed by a delayed delivery, asks a chatbot, "My order is late, can I get a refund?" The chatbot checks its database, which was last updated hours ago, and confidently replies, "Your order is marked 'Out for Delivery' and is on schedule".

But here’s the problem: Just ten minutes before this interaction, the logistics partner logged a "delivery_exception" event. The AI system, however, was completely unaware of this crucial update.

This is a frequent issue in today's AI customer service. It points to a major design flaw: contextual blindness. The system's answer was technically correct based on its outdated information, but that information was completely disconnected from reality.

This problem of contextual blindness leads to a negative customer experience:

• They often encounter ineffective agents who simply don't have the right information to solve their issues.

• This results in what's called the "agent carousel" or conversational loop - customers get transferred multiple times, having to repeat their problem over and over again.

The Achilles' Heel of Conventional Agents: Stale Context

Crucially, this isn't a minor issue that can be fixed by just using a slightly better Large Language Model (LLM). The real hallmark of truly advanced AI agents is not just their reasoning abilities, but their deep, real-time awareness (up-to-the-millisecond) of the business environment they operate within

The effectiveness of any intelligent agent is fundamentally limited by the quality and timeliness of the information it can access. A shipping update from yesterday is not just less valuable. For a customer inquiring about their delivery today, it is actively misleading and counterproductive. This is the challenge of data latency and context drift, the Achilles' heel of conventional AI agent architectures.

Traditional systems, often built on batch processing paradigms, are inherently unsuited for the demands of real-time agentic AI. In a batch-driven architecture, data is collected, processed, and loaded into analytical systems on a periodic schedule - typically nightly.

Architecture Blueprint for Context-Aware Agentic AI

The key to advancing agentic AI lies in building a contextual system - a real-time, event-driven architecture that feeds the agent's "brain" (the LLM) with a continuous stream of information about the state of the business. We will provide an architectural dive into the core components and their synergistic roles:

Apache Kafka & Apache Flink: This duo forms the sensory network of the system, capturing and processing every business event - from order placement to shipping updates - in real time, ensuring the agent is never operating on stale information.
LangGraph & Google Gemini: This is the cognitive core. LangGraph provides the framework for stateful, cyclic reasoning, while a powerful external LLM like Gemini serves as the decision-making engine, planning and orchestrating actions based on the rich context it receives.

pgvector & Semantic Search: This constitutes the agent's long-term memory, a knowledge base of past interactions and resolutions that provides deep historical context to inform and improve current decision-making.

This architecture ensures the AI agent has access to real-time customer context, enabling up-to-date and personalized customer service responses.

Breaking Down the Stack

The Real-Time Backbone: Kafka and Flink

At the heart of this architecture lies Apache Kafka, which serves as the central, durable, and highly scalable log for all business events. Kafka acts as the single source of truth for data-in-motion, decoupling the systems that produce data (e.g., the e-commerce backend, shipping provider APIs, CRM) from the systems that consume it (such as our Flink application). This decoupling is crucial for building a scalable and resilient architecture where components can evolve independently.

While Kafka provides the event stream, Apache Flink provides the framework to process it. Flink is not merely a data transformation tool. It is a powerful stateful stream processing engine. This statefulness is the architectural keystone. Flink consumes event streams from Kafka to build and continuously maintain a rich, in-memory "state" for every active entity within the business, such as each customer's current session or each open order. This state represents the most current, comprehensive context available at any given moment, updated with millisecond latency as new events arrive.

CREATE
TEMPORARY VIEW enriched_clicks AS
SELECT c.event_time, c.user_id, c.url,c.event_type, c.event_properties, cust.tier
FROM clickstream_events AS c
        JOIN customers FOR SYSTEM_TIME AS OF c.event_time AS cust ON c.user_id = cust.customer_id;

Enrich the clickstream with customer data using a Temporal Join. This creates a continuous view of clicks, now with customer tier information.

Flink isn't just pre-processing data for the agent. Its rich API allows you to build real-time applications with cross-stream joins, complex event-driven pipelines, and stateful analytics - whether through a high-level SQL interface or a programmatic API. When the agent needs to act, it doesn't have to query multiple slow, disparate source systems. Instead, it can be instantly furnished with a snapshot of this pre-computed.

INSERT INTO customer_session_summary
SELECT
    window_start,
    window_end,
    user_id,
    MAX(tier) AS tier, -- Tier is constant within the session
    COUNT(*) AS clicks_in_session -- Example calculated aggregate
FROM
    TABLE(
        SESSION(
            TABLE enriched_clicks,
            DESCRIPTOR(event_time),
            INTERVAL '1' MINUTES
        )
    )
GROUP BY
    window_start,
    window_end,
    user_id;

Aggregating user events into sessions defined by user time of inactivity. Next to SQL syntax, Apache Flink also offers rich programmatic API

The Agent's Cognitive Core: Orchestrating with LangGraph and Reasoning with Gemini

With a real-time backbone in place to provide continuous context, the next challenge is to build a cognitive core capable of using that context to reason, plan, and act. Customer service conversations are rarely linear - they involve loops (e.g., "Is there anything else I can help you with?"), conditional logic (e.g., "If the item is damaged, initiate a return, otherwise, check the warranty"), and persistent state changes. A simple, linear chain of LLM calls is insufficient for managing this complexity.

This is where a framework like LangGraph becomes essential. LangGraph is designed for building stateful applications, modeling them as graphs rather than chains. The LangGraph framework is built around a few key concepts:

State (AgentState): This is the central data structure that persists throughout the execution of the graph. It holds all information relevant to the current task, including the conversation history, the outputs of any tools that have been called, and - critically in our architecture - the initial real-time context provided by the Flink application.
Nodes: Nodes are functions or callable objects that receive the current AgentState as input, perform some logic, and return an update to the state.
- reasoning_node: This node encapsulates the call to the LLM (Gemini). It takes the entire current state, formats it into a prompt
- tool_node: This node is responsible for executing the action chosen by the reasoning node
Conditional Edges: Edges define the flow of control between nodes. A conditional edge is a decision point that routes the execution to different nodes based on the current state.

Graph pictures a cyclic workflow for a single AI agent. At its core is a decision node that evaluates whether a tool should be used, and if so, selects the appropriate one from a set of registered tools. The workflow is stateful, with access to AgentState

Gemini as the Reasoning Engine

Within the reasoning_node, a powerful LLM like Google's Gemini (gemini-2.5-pro or similar) serves as the agent's "brain." It receives the entire AgentState as its input context. This includes the initial, real-time data from Flink (e.g., "Order #123 is DELAYED due to WAREHOUSE_SCAN_MISSED") and the full history of the conversation. Gemini's advanced reasoning and function-calling capabilities allow it to analyze this rich context and formulate a precise plan of action, such as invoking a specific tool with the correct parameters.

A unique advantage of this architecture is the ability to perform dynamic tool building. The real-time context supplied by Flink can be used to dynamically construct or modify the set of tools available to the agent for a specific interaction. For example, if the context reveals that the customer is a 'VIP' and their order is 'delayed', the system can dynamically generate and add a grant_service_credit tool to the list of functions presented to the Gemini model. This makes the agent highly adaptive, equipping it with the exact capabilities needed to resolve a situation without exposing unnecessary or irrelevant functions.

The Agent's Memory: Hybrid Retrieval with pgvector

While the Flink and Kafka backbone provides the agent with its real-time, short-term context, an effective agent also requires long-term memory (LTM). It needs to learn from the past to solve problems in the present. In our architecture, this LTM is implemented using a PostgreSQL database enhanced with the pgvector extension. This database stores vectorized representations (embeddings) of historical support tickets, their resolutions and company FAQs.

The process of populating this LTM involves a straightforward embedding generation pipeline. Using the LangChain framework, one can leverage the GoogleGenerativeAIEmbeddings class with a Gemini embedding model like gemini-embedding-001. Vectors previously stored in a vector column within a PostgreSQL table, are ready for querying with a similarity_search_with_score method of vector store:

# Search for similar problems with similarity scores
results = self.vector_store.similarity_search_with_score(
    problem_description,
    k=1  # Only return the most similar result
)

if not results:
    logger.info("No similar problems found in database")
    return None
# Get the best match
document, similarity_score = results[0]

# Convert distance to similarity (cosine distance: 0 = identical, 2 = opposite)
similarity_score = 1 - (similarity_score / 2)

# Check if similarity meets threshold
if similarity_score < self.similarity_threshold:
    logger.info(f"Best match similarity {similarity_score:.3f} below threshold {self.similarity_threshold}")
    return None

return {
    'problem_description': document.metadata.get('problem_description', document.page_content),
    'resolution': document.metadata.get('resolution', ''),
    'category': document.metadata.get('category', ''),
    'similarity_score': similarity_score
}

In Summary

Understanding and avoiding contextual blindness can save a lot of time if you are planning to productionize customer service system. We have explored the architecture of a real-time context aware AI agent built on three pillars:

Event-driven backbone - leveraging Kafka and Flink to provide real-time awareness.
Stateful, cyclic cognitive core - built with LangGraph and Gemini for intelligent orchestration.
Hybrid long-term memory - powered by pgvector to enable context-rich historical retrieval.

Together, these components form a system that continuously refreshes and contextualizes information, dramatically reducing the risk of AI hallucinations. We have described a few example technologies that fulfill this role. But there is a wide variety of technologies available on the market to choose from - choose wisely and don’t miss the most important takeaway from this exploration:

building an effective AI customer service requires a real-time backbone.Would you like to see AI Agents working in real-time? Sign up for a free demo!

Tags: