Blog

Multi agent workflow with Roo Code

Olena Borzenko & Janek Fellien

Olena Borzenko

November 24, 2025

14 minutes

At one of our Innovation Days, my colleague Janek Fellien pitched an idea that, to me, sounded like: “Let’s build a multi-everything system!” Of course, that’s not exactly what he said, but that’s how it landed in my head — and it was enough to get me to spend the entire day experimenting with him to see how far we could take it.

I’ll start this article with the introduction written by Janek, so you can see where we started.

Introduction

The idea of a multi-agent system, in which each agent receives its own model, came to me when I was talking with a friend about his software project, a Jump-and-Run game. I'm not really into the AI topic, and in my opinion, we’re not in the middle of the hype phase — so it didn’t affect me that much. But after I saw what’s possible with a multi-agent system, I fell for the hype myself.

Why? Quite simple: my friend essentially showed me a virtual development department that performs the work he described as a prompt in human-understandable language. One agent takes on the role of software architect and decides which frameworks and software structures to use. Another writes the code. The results are then checked by a validator, tests are written, and — particularly exciting — documentation is generated. In addition, he has an agent in use that can editorially prepare YouTube-Videos to promote the individual features of the game in YouTube-Videos. How brilliant is that?

All of this arose out of necessity, because he is developing the game alone and doesn’t have the time to handle every detail. So he began creating little helpers to take over certain steps. Of course, it didn’t work smoothly from the beginning. It took him two months to build the instruction files — the files you need to control an agent. The fascinating thing is that with these files, you can define your own way of developing software. I find that especially exciting, because it lets you implement rules and principles according to your own wishes.

For his multi-agent setup, my friend uses Claude in the CLI. As magnificent as the results are, you are equally restricted with it, because with Claude you can indeed use multiple models, but only its own — Sonnet, Opus, and Haiku. But what if I want to use GPT or another model? That’s precisely why I submitted the topic “Multi-Agent, Multi-Model” to InnoDay in Zürich. What happened next can best be told by Olena, because she ultimately found the solution.
by Janek Fellien

As much as I would have liked to ultimately find the solution, I’d say we simply got to the point where something was working. And that’s exactly what I’ll be sharing in this article — how far we got, and what the journey looked like.

Starting Point: Personas in Custom Chat Modes

Our starting point that day was a conversation about custom chat modes in VS Code and how you can use different personas as part of your workflow. This approach is very well described in an article by our Swiss colleague Emanuele Bartolesi, “GitHub Copilot: A Persona-Based Approach to Real-World Development.” Here is a link: https://dev.to/this-is-learning/github-copilot-a-persona-based-approach-to-real-world-development-56ee

In that article, he explains that you should treat Copilot like a team instead of a single helper. Writing strong custom instructions can significantly improve the results you get from your AI-powered assistant. However, as Emanuele also points out, Copilot will keep making the same mistakes unless you shift toward a more team-like approach when working with it. I highly recommend checking out the article and giving it a read — but I’ll briefly summarize the core idea here so the rest of this article makes more sense.

I’ll skip the details on prompt engineering and focus on the main concept: having a whole team in your IDE, ready to work for you. The core of this approach is to provide custom chat mode files with descriptions of different personas you might need for your project. Just like in real life, it’s important to clearly define each person’s role and responsibilities — the more structured your persona prompts are, the better they’ll work as part of your workflow. And just as real teammates bring different skill sets and expertise, custom chat modes can use different tools and even different AI models.

This is powerful not only because it’s cost-effective (you can use the right model for the right task), but also because switching models helps you get better results depending on the use case. Having a team of experts is great, but they’re only effective if they work together. In real life we write guidelines and standards for projects; in the world of AI-powered assistants, that consistency comes from custom instruction files. No matter which persona is active, some shared “common sense” carries across all responses, reducing the mess or randomness that can creep in when working with AI.

The result is a more intuitive and faster workflow. You don’t waste time explaining to Copilot what went wrong after it drifts off-topic, because you can switch to a persona that’s more focused on the specific task. Iterations become quicker, and even things we often forget to ask Copilot — like reviewing its own code — can be handled by a dedicated persona. As your project scales, your team of personas can scale with it.

These personas worked really well and matched the spirit of our initial idea. But we started to wonder: what if we could automate the switching itself? Instead of manually selecting a chat mode in VS Code, could the system pick the right one automatically when needed? That question quickly grew into a bigger one — could we orchestrate multiple agents, or even multiple models, to handle different parts of a workflow? We weren’t aiming for parallel execution or anything too futuristic; just a prototype with automatic agent selection and model switching.

After some investigation, we realized that we couldn’t easily achieve this with GitHub Copilot alone. That’s when Janek suggested trying out a tool called Roo Code.

Roo Code: Orchestrating Agents and Models

Roo Code is an open-source tool that brings the idea of a whole dev team of AI agents right into your code editor. At first, we misunderstood how it worked — we hoped it would simply pick up our custom chat mode files, combine them with a main instruction file (describing dependencies and relationships between roles and personas), and magically give us an automated multi-agent, multi-model workflow. Sounds too good to be true, right?

As it turned out, we weren’t completely wrong. Roo Code didn’t work exactly that way, but with a bit more setup and configuration, we were able to achieve what we expected from it — a tool capable of orchestrating AI agents.

At first glance, it feels similar to GitHub Copilot: you can define custom modes (personas) with their own instructions and even associate them with different models. The real difference, though, is that Roo can act as an orchestrator. Instead of being just one assistant with multiple personalities, it coordinates complex tasks by delegating them to the right specialized mode at the right time.

To make this work, you need to decide which AI models Roo should use. This is part of the setup and it gives you a lot of flexibility. You can connect different providers — OpenAI, Anthropic, or even local models through something like LiteLLM — and assign them to modes depending on what each mode is meant to do. That way, lightweight models can handle quick or repetitive tasks, while more advanced ones can focus on planning or deeper reasoning.

Another strength of Roo Code is that it keeps track of context across modes. Even when work is passed between them, it uses indexing and smart prompt management to make sure the conversation stays consistent. In practice, this means you can create something that feels like a coordinated team: each agent has a role, they all know how to talk to each other, and together they can handle more complex workflows than a single Copilot-style assistant.

Configuration Options

I already mentioned LiteLLM, which was our choice — but Roo Code is flexible. You’re not locked into one model or provider. In fact, depending on your needs (speed, cost, privacy), there are several ways to wire up different LLMs to Roo Code.

According to the Roo Code docs and community write-ups, you can configure Roo to use various AI providers (OpenAI, Anthropic, Google Gemini, etc.) or even local/self-hosted models. Some tutorials walk you through using OpenRouter as a bridge — Roo sends API calls through OpenRouter, which then routes to many available models. There are also more advanced setups where you integrate enterprise tooling like Portkey to handle governance, cost tracking, or routing logic.

In practice, you pick a “provider” in Roo’s settings (or via a configuration profile), supply the API keys or endpoints, and assign particular models per mode. The flexibility means you can mix and match: maybe your “Architect” mode uses a high-powered reasoning model, while your “Code” mode uses a fast, lightweight one.

That said, each approach comes with tradeoffs. Remote providers bring latency and cost per token; local/self-hosted models give you more control and privacy but tend to be slower or demand more hardware. Some community testers have tried hybrid setups or local deployments via Ollama to reduce costs while keeping control.

LiteLLM setup and configuration

From here, I’ll walk you through our LiteLLM setup — how we configured it and the steps we took.

For our setup, I created a separate directory to keep things clean and ran the commands needed to generate the docker-compose and other configuration files.

# Get the code
curl -O https://raw.githubusercontent.com/BerriAI/litellm/main/docker-compose.yml
curl -O https://raw.githubusercontent.com/BerriAI/litellm/main/prometheus.yml

# Add the master key - this can be changed after setup
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env

# Add the litellm salt key - you cannot change this after adding a model
# It is used to encrypt / decrypt LLM API Key credentials
# I used a random hash insted of sk-1234
echo 'LITELLM_SALT_KEY="sk-1234"' >> .env

source .env

# Start
docker-compose up

Once that was in place, I added a litellm_config.yaml file with the definitions of the models I wanted to use. This configuration file is where you tell LiteLLM which providers and models it should connect to. You can mix and match here — for example, one entry could point to an OpenAI GPT-4 endpoint, another to Anthropic’s Claude, and yet another to a local model running on your machine. Roo Code will then be able to use those models through LiteLLM as if they were all part of the same system.

litellm_settings:
  drop_params: true

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-01-01-preview"

  - model_name: gpt-5-nano
    litellm_params:
      model: azure/gpt-5-nano
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2024-12-01-preview"

In my configuration file, I set up OpenAI GPT-4o and GPT-5-nano as the models. That meant I needed an AI project on Azure with those models already deployed. I used them mainly for testing purposes, but the structure of the file makes it clear that I could just as easily have added models from other providers as well.

Once the configuration was ready, I started the Docker image and tested whether the local setup was working correctly. For the AI_SERVICE_API_KEY and AI_SERVICE_BASE values, I used the credentials from my deployed Azure resources.

docker run \    -v $(pwd)/litellm_config.yaml:/app/config.yaml \
    -e AZURE_API_KEY=AI_SERVICE_API_KEY \
    -e AZURE_API_BASE=https://AI_SERVICE_BASE.services.ai.azure.com/ \
    -p 4000:4000 \
    ghcr.io/berriai/litellm:main-stable \
    --config /app/config.yaml --detailed_debug

After that, I went through a few iterations to make sure the models were configured correctly. At one point, I ran into an issue with the api_version value for one of the models. Fortunately, Roo Code made it easy to troubleshoot — by running the command below I was able to perform a health check on my proxy server and quickly identify the problem.

curl --location 'http://0.0.0.0:4000/health' -H "Authorization: Bearer sk-1234"

Once the health check confirmed that all models were connected properly, I made a quick completion request to double-check everything was working. With that done, I moved on to the next step: configuring profiles and providers in Roo Code.

Providers and Modes Configurations

Roo Code is a VS Code extension, and the interface looks like this:

To use the models connected to your local proxy server, you need to go into the Roo Code settings and add a Providers Configuration. I created two profiles called Reasoning and Completion so I could easily distinguish which profile should be assigned to a specific chat mode.

Here’s an example of how the profile configuration looks: you enter the profile name, then select the API provider from the dropdown. In my case, it was LiteLLM. From there, I just pointed it to the URL of my locally running container and added the same API key I used during proxy server configuration. After filling in these details, my models appeared in the dropdown.

At this stage, I could create a dedicated profile for each deployed model. The last step before testing was connecting those profiles to custom chat modes. Roo Code comes with four default chat modes, which you can see below:

To finish the setup, I went back into the settings tab and, for each chat mode, selected one of the provider profiles in the API configuration section. After repeating this for every chat mode, the setup was complete and ready for testing.

Note: In Roo Code, chat modes can be configured directly within the IDE. Alternatively, you can take a similar approach to GitHub Copilot by creating dedicated chat mode files for each configuration.

How the Orchestrator Role Works

This was the most interesting part for me — seeing how the orchestration actually works in practice. For context, here’s the description of the Orchestrator role that comes by default in Roo Code:

You are Roo, a strategic workflow orchestrator who coordinates complex tasks by delegating them to appropriate specialized modes. You have a comprehensive understanding of each mode's capabilities and limitations, allowing you to effectively break down complex problems into discrete tasks that can be solved by different specialists.

This is where the magic happens. Whenever coordination is required, Roo Code switches into the Orchestrator role, which then decides what to do next and which mode should handle it.

Here’s the When To Use description from Roo Code for the same role:

Use this mode for complex, multi-step projects that require coordination across different specialties. Ideal when you need to break down large tasks into subtasks, manage workflows, or coordinate work that spans multiple domains or expertise areas.

Final result

For testing, I asked Roo Code to build a simple calculator app in React that supports basic operations. After submitting my request, I could simply sit back and observe the process. The only times it paused were when it required confirmation to run a command—for example, to install packages.

Here are a few screenshots from the same session, showing how Roo Code switches between roles and handles requests in those roles.

As you can see, it first processed my request in the Architect role. After a short while, it switched to Code mode:

Conclusion

Digging into Roo Code gave me a better picture of how it actually works under the hood. The role switching isn’t just for show — each mode has its own responsibilities, and watching how it transitions between them makes the whole process much easier to follow.

What I also found interesting is how it connects to the LLM provider while pulling in tools and commands when needed. That mix of reasoning and action is what makes the workflow feel structured, but still flexible. You can see when it needs a confirmation, when it’s reasoning at a higher level, and when it’s just running code.

For me, the most valuable part was simply observing the mechanics: how requests get broken down, how roles change, and how external integrations are handled. It’s less about the end result and more about understanding the process — and that’s where Roo Code really shows what it can do.

Magazine

This article is part of XPRT#19

Step into the era of intelligent transformation with the XPRT. Magazine Gold Edition, a collection of cutting-edge insights from Xebia’s Microsoft experts.
This issue dives deep into AI innovation, cloud modernization, and data-driven growth, showcasing how technology and people come together to drive progress across industries.

Download Magazine