Blog

Securing (personal) AI Agents

Michael van Rooijen

June 15, 2026

22 minutes

This article is part of the XPRT. Magazine #21

Ever since I started working with AI agents like Claude Code and the newer models like Opus 4.5 late last year, and noticing how much better they'd become, I've had a voice in the back of my head: Is there a way I could have an agent help me manage my personal and professional life? Being a father of 5, working 36 hours a week, and 40+ years old, I often notice I could use a hand to make sure I don't forget to do things. I also regularly remember to do something at the exact moment I can't do anything about it. Should I just become better at remembering things, or is there something that could help me? Wouldn't it be great if everyone could have that Executive Assistant (EA) that those management types on TV shows seem to have? The EA that doesn't forget anything and makes sure you are well-prepared for everything?

Then OpenClaw came out a few months ago, and for the first time, I thought: "Wow, this might just be exactly what I need. OpenClaw could be my personal assistant!" But I kept running into articles about security concerns. These security concerns have been holding me back from actually using it.

So I asked myself: How do the other agents approach security? What do I find important? Do I think OpenClaw is indeed insecure, and could I do anything about it?

In this article, I will look at the options available and how they could be applied to (personal) AI agents.

Considering security in the context of AI agents

To be able to determine what kind of additional security measures could be taken, we first need to consider the risks that are inherent to using LLMs and the agent harnesses (without any protections):

Prompt injection. An agent that processes untrusted content: a web page, an email, a document someone sent you, can be manipulated by instructions embedded in that content. The LLM can't reliably distinguish between "instructions from the user" and "instructions embedded in the thing I'm reading." A malicious page can tell your agent to exfiltrate your notes or delete files, and if the agent has the capability to do that and no layer below the model to stop it, it will.

Uncontrolled blast radius. An agent inherits your user's permissions. That means it can read your SSH keys, write to your home directory, delete files, send emails on your behalf, and interact with any local service your user account can reach. The agent isn't malicious, but it can make mistakes, misunderstand instructions, or be manipulated. Anything it does is just like you'd done it yourself.

Data exfiltration. An agent with read access to your files and unrestricted network access has, in principle, the ability to send your files anywhere. This is the intersection of two capabilities that often coexist in default agent setups: broad read access (the agent needs to see your files to help with them) and unconstrained outbound traffic (the agent needs to call APIs).

Credential exposure. Most developer environments accumulate credentials over time: API keys in shell profiles, cloud credentials in ~/.azure, SSH keys, .env files scattered through project directories. An agent process running in your shell inherits all of this. If the agent is compromised or misbehaves, those credentials are in scope. Most developers aren't aware of all the secrets that live in their environment.

Irreversibility. When an agent modifies or deletes a file using its built-in tools, that change lands on your real filesystem immediately. There's no undo buffer, no preview step, no trash bin. A misunderstood instruction or a bad model output results in real data loss. For a tool that's operating autonomously on your behalf across a long session, this matters more than it does for a tool that's making one change at a time with your eyes on it.

These risks don't sit in isolation. Prompt injection is more dangerous when the blast radius is large. A large blast radius matters more when there's no rollback. Credential exposure is more consequential when there's no network egress control. You can't fully address any one of them without thinking about how it connects to the others.

What protections do the agent harnesses provide

When working with agents, you quickly notice that they (seem to) provide protection, because they will ask you for approval for taking a certain action. At the same time, you also notice that there are differences between the agents: in some, you don't need to approve as many actions as you would need to in others. The fact that you can approve a certain action means that the process the agent runs in already has (or assumes it has) this permission, so, in that sense, the protections are soft. The agent process decides to either allow or disallow the use of the pre-existing permission.

You can also see that these protections are usually defined at the harness level, not at the underlying resource level. A good example is web access: some agents allow doing a web search without asking for approval first, but if you explicitly ask it to open a specific website, it will first ask for permission. Technically, both actions involve connecting to websites. The difference is that the harness treats "search the web" and "open this site" as two different capabilities with different trust levels and approval rules.

Because all of these protections are soft protections anyway, and because these protections differ per AI agent, we are not going to go into the details of what each agent does or could do. We will look at this from the perspective of: how can we secure any agent? That's what we'll be looking at next.

Technical control surfaces

To think clearly about how to secure an agent, it helps to break the problem down into concrete control surfaces:

Filesystem access. What files and directories the agent can read, write, delete, or modify on the machine.
Network access. What network destinations can the agent reach, including Internet endpoints, internal services, and whether outbound traffic is filtered or unrestricted?
Environment/secrets access. Sensitive values the agent can access from its runtime context, such as environment variables, .env files, cloud credentials, SSH keys, and token stores.
Process/tool execution. What commands, scripts, shells, or built-in tools the agent is allowed to run, and whether those executions are gated by approval or policy.
Local services / privileged interfaces. What powerful local interfaces can the agent talk to besides ordinary files or internet endpoints, such as Docker sockets, SSH agents, databases on localhost, browser debug ports, or other helper processes?

With these control surfaces in mind, the next question is where to apply defences in practice. In other words, if these are the capabilities that make agent failures dangerous, what technical controls can we put in place to constrain them?

Foundation: Process Isolation

The first thing you want is a boundary between the agent and the rest of your machine.

Without that boundary, the agent runs as a normal process in your normal user environment. It sees what you can see, reaches what you can reach, and uses the same permissions you use. At that point, there is very little to control. The agent may request approval, but the underlying process continues to operate with broad host-level access. Process isolation changes that. It gives you a place to enforce rules.

Once the agent is isolated, you can start making decisions about what it should be able to see, what it should be able to modify, what network access it should have, and which local interfaces it should be allowed to touch. In other words, process isolation does not solve the entire problem, but it creates a technical boundary that enables later controls.

Process Isolation Options Across Operating Systems

The exact way you can do this depends on the operating system.

On Linux, containers are the practical default. Docker and Podman are usually the easiest way to stop the agent from running directly on the host while still keeping a familiar workflow. If you want stronger isolation, gVisor adds another barrier before the host kernel, and Firecracker goes further by running the workload in a microVM.

On macOS, the usual choice is either a native sandbox around the process or a Linux environment provided by Docker or another VM-based tool. Anthropic's srt is interesting here because it gives you a lightweight way to run a process inside native sandbox boundaries without having to build a whole VM workflow first.

The important point is not which specific tool you pick first. The important point is that the agent stops running as an unrestricted host process.

For most people, the right starting point is the simplest isolation mechanism that they will actually use consistently. Stronger isolation exists, but having a real boundary today is more valuable than planning a perfect one and never setting it up.

With the foundation in place, we can start applying controls to the five surfaces we defined earlier.

Filesystem Controls

For the filesystem, we should, by default, choose not to share any directories or files from the host with the agent. For the files or directories the agent does need to interact with to do the job you've asked it to do, we define three filesystem classes:

Read-only inputs. Documentation, reference material, notes, and source trees the agent needs to inspect but should not be able to modify directly.
Writable workspaces. The project directories where you want the agent to make changes, but with a clear recovery path if those changes are wrong.
Ephemeral scratch space. Temporary files, caches, and intermediate outputs that do not need to survive the session.

Read-only inputs

The default should be that anything the agent only needs to inspect is technically read-only. For containers, this could be done by mounting host directories as read-only:

-v /home/user/docs:/docs:ro
-v /home/user/notes:/notes:ro

In this way, we can make sure these files can never be altered.

Writable workspaces

For the files and directories the agent needs to make changes to, we'll look at two options:

Git worktrees

If the workspace is a Git repository, a git worktree is a clean and simple approach. A worktree creates a separate working directory linked to the same repository, checked out on its own branch. Crucially, the actual .git folder stays in the main repository. The worktree directory only contains a .git file (a pointer), not the git data itself. When you mount only the worktree directory into the container, the agent has no path to the main .git folder and cannot destroy your history, even intentionally.

# Create a dedicated worktree on a new branch for the agent to work in
git -C /home/user/project worktree add /home/user/agent-workspaces/project agent/session-1

# Mount only the worktree directory — not the parent repo — into the container
# --rm removes the container automatically when it exits, so no state persists between runs
docker run --rm -it \
  -v /home/user/agent-workspaces/project:/workspace \
  my-agent-image

After the session, review the agent's commits on the agent/session-1 branch and merge or discard them as you see fit. If something went wrong, delete the branch and remove the worktree directory — the main repository is untouched.

OverlayFS

Another Linux-specific option is to use OverlayFS. OverlayFS provides a copy-on-write layer on top of a real directory. The agent sees a normal writable directory, but writes, edits, and deletions go to a temporary upper layer rather than to the original files underneath. That means the agent can behave as if it has full write access, but you can still inspect what it did and decide whether to apply the changes. On macOS, the practical alternatives are usually to work on a copied workspace or to run the agent inside a Linux VM or Docker-managed Linux environment and use OverlayFS there.

Conceptually, there are four parts:

lowerdir: the real original project, treated as the base layer
upperdir: the place where new and changed files are stored
workdir: internal working space OverlayFS needs to operate
merged: the directory view you hand to the container; it looks like one normal writable directory

With containers, this could be approached as follows:

# Prepare the writable, temporary OverlayFS directories
mkdir -p /tmp/agent-overlay/{upper,work,merged}

# Create a merged view:
# - the real project is the read-only base layer (lowerdir)
# - all agent changes go into /tmp/agent-overlay/upper
# - /tmp/agent-overlay/merged is the directory the container will see
sudo mount -t overlay overlay \
  -o lowerdir=/home/user/project,upperdir=/tmp/agent-overlay/upper,workdir=/tmp/agent-overlay/work \
  /tmp/agent-overlay/merged
# Note: mounting overlayfs requires root. If root is not available in your setup,
# fuse-overlayfs (https://github.com/containers/fuse-overlayfs) provides a rootless alternative.

# Give the container access only to the merged view
docker run --rm -it \
  -v /tmp/agent-overlay/merged:/workspace \
  my-agent-image

After the session, you can inspect /tmp/agent-overlay/upper to see exactly what changed. The original files in /home/user/project remain untouched unless you later decide to copy the changes back. If the result is not good, you can unmount the overlay and discard the temporary upper layer.

Ephemeral Scratch Space

Agents also need a place to store temporary files: downloads, build artefacts, parser output, caches, and other session-local state.

That space should usually be tmpfs or another temporary directory that disappears when the session ends.

docker run --rm -it \
  -v /home/user/project:/workspace \
  --tmpfs /tmp:rw,noexec,nosuid,size=512m \
  my-agent-image

In this example, the agent still gets its normal writable workspace at /workspace, but /tmp is backed by in-memory storage rather than a persistent directory on disk. Downloads, build artefacts, caches, and parser output can be written there during the session, but they disappear automatically when the container exits. This does two things: It reduces the clutter each agent run leaves behind, also decreasing a risk vector, and it makes it much easier to understand what the agent actually intended to keep versus what it only needed while working.

Network Egress Controls

This control addresses the risk of data exfiltration. An agent with read access to your files and network access to the internet can, in principle, send anything to an arbitrary endpoint. You need to control what the agent can reach on the network.

Firewall Rules

The first option would be a firewall that blocks all outbound traffic by default and allows only what you explicitly permit. If you're running the agent in Docker, enforce firewall rules on the host that apply to the Docker network created for the agent.

This is how you would remove network access completely, but in practice, this will not work for agents whose models and inference are cloud-based:

docker run --rm --network none your-prebuilt-agent-image

To allow certain IP addresses through, in Linux, you would:

Put the agent on a dedicated Docker bridge network.
Identify that bridge interface on the host.
Apply outbound rules for that interface with iptables or nftables.

# Create a dedicated network for the agent
docker network create agent_net

# Run the pre-built image on that network
docker run --rm --network agent_net --name agent your-prebuilt-agent-image

# On the host: find the bridge interface Docker created for that network
docker network inspect agent_net

# Then apply egress rules to that bridge in the DOCKER-USER chain
# Replace br-xxxxxxxxxxxx with the actual bridge name
iptables -I DOCKER-USER -i br-xxxxxxxxxxxx -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -I DOCKER-USER -i br-xxxxxxxxxxxx -p udp --dport 53 -j ACCEPT
iptables -I DOCKER-USER -i br-xxxxxxxxxxxx -p tcp --dport 53 -j ACCEPT
iptables -I DOCKER-USER -i br-xxxxxxxxxxxx -d 160.79.104.0/23 -p tcp --dport 443 -j ACCEPT
iptables -A DOCKER-USER -i br-xxxxxxxxxxxx -j DROP

The tradeoff to understand: this is workable when the destinations are stable. But IP addresses often change. Anthropic's API could be served from different IPs at different times, making it hard to maintain an accurate allowlist of IPs.

If you're not using containers: on Linux, you'd usually solve this with host-level iptables or nftables. On macOS, the equivalent is typically the built-in Packet Filter (pf).

DNS Filtering

Going one step further: block DNS resolution for domains you haven't approved. If the agent can't resolve an unauthorised domain, it can't reach it regardless of IP.

This works well as a first line of defence. Pi-hole or Unbound configured to whitelist specific domains is a reasonable setup. But DNS filtering alone isn't sufficient. An attacker (or a malfunctioning agent) can hardcode IP addresses and bypass DNS entirely.

HTTP Proxy with Allowlist

An HTTP proxy that the agent must route all traffic through gives you more control than an IP-level firewall. The simplest useful version is host allowlisting: only permit requests to a small set of approved domains.

mitmproxy is what I've used more often for different scenarios where I needed a proxy. You can write a Python script that implements your allowlist:

# mitmproxy_filter.py
from mitmproxy import http

ALLOWED_DOMAINS = {
    "api.anthropic.com",
    "api.github.com",
    # add only what your agent actually needs
}

def http_connect(flow: http.HTTPFlow):
    # Fires on the CONNECT request, before TLS — so we can block by hostname
    # without needing to intercept or decrypt the traffic.
    if flow.request.host not in ALLOWED_DOMAINS:
        flow.response = http.Response.make(
            403,
            f"Blocked: {flow.request.host} is not in the allowlist",
            {"Content-Type": "text/plain"},
        )

Run it:

mitmdump --listen-host 0.0.0.0 --listen-port 8080 -s mitmproxy_filter.py

I'm using mitmdump here because it's the simplest non-interactive way to run the filter script. If you want to watch connections in real time while the agent runs, mitmproxy provides a terminal UI, and mitmweb provides a browser UI.

Then route all traffic from your agent container through the proxy. With Docker, you can set the proxy via environment variables:

docker run --rm -it \
  -e HTTP_PROXY=http://proxy-host:8080 \
  -e HTTPS_PROXY=http://proxy-host:8080 \
  my-agent-image

This works for both HTTP and HTTPS at the hostname level. For HTTPS, the client still tells the proxy which host it wants to connect to, so the proxy can allow or block that destination without needing to inspect the encrypted contents.

If you want to do more than simple host allowlisting, things get more involved. Transparent proxying lets you intercept traffic even when the application doesn't honour proxy settings. Inspecting HTTPS paths, headers, or bodies requires TLS interception, which means the client must trust the proxy's certificate authority. That can be useful in certain scenarios, but it is much more complex to set up.

Secrets Controls

If the filesystem and network controls define what the agent can reach, secret controls are about how we ensure agents can't read actual secrets (e.g., API keys, tokens, etc.). If your agent needs to call the Anthropic API, call GitHub, call your calendar API, those API keys should not be in the agent's environment or filesystem. Instead, route those calls through a controlled gateway that holds the credentials and makes the calls on the agent's behalf, after validating that the request is within policy.

In practice, this is a small proxy or sidecar process that:

Receives the agent's API request
Validates it against a policy (this endpoint is allowed, this request shape is allowed)
Injects the real credentials
Makes the actual API call
Returns the response

If the agent is compromised, the attacker can only make policy-compliant API calls through your gateway. They don't get the raw API keys.

agentgateway is a CNCF (Cloud-Native Computing Foundation) sandbox project that puts a policy and control layer in front of agent traffic. With its simplified LLM configuration, the interesting part for this use case looks like this:

# agentgateway.yaml
llm:
  models:
    - name: claude-haiku
      provider: anthropic
      params:
        model: claude-3-5-haiku-20241022
        apiKey: "$ANTHROPIC_API_KEY"

Then your agent points at Agent Gateway instead of Anthropic directly. The key lives in the gateway's environment, not the agent's. The agent gets model access, but it never gets to read or exfiltrate the raw credentials.

How the gateway gets that credential is a separate choice. In a simple personal setup, you can provide it from the host when you start the gateway container. If you also want to ensure the host doesn't contain secrets, the gateway can fetch them from a secret manager instead.

Execution and Local Interface Controls

So far, the focus has been on what the agent can read, where it can send data, and what secrets it carries. There is one more category of power to account for: what tools and helper interfaces it can invoke while doing its work.

This matters because an agent can have more effective power than its filesystem and network controls suggest if it can trigger privileged helpers on the same machine. A shell command that can talk to Docker, an SSH agent loaded with keys, a database listening on localhost, a browser debug port, or an MCP server with broad authority can all become ways around an otherwise careful setup.

For personal agent setups, I would treat this as a combined control area with two practical rules.

First: keep execution narrow. The agent should only be able to run a small set of commands that are actually useful for the task: search, diff, build, test, maybe lint. The more general-purpose shell access you allow, the more ways the agent has to turn a mistake or prompt injection into real side effects.

Second: remove the ambient local authority. If the agent does not need the Docker socket, your SSH agent, local admin tools, browser automation ports, or database access, those interfaces should simply not be reachable from its environment. Otherwise, you may think you have constrained the agent because its filesystem and network are controlled, while in reality, it still has a side door into much more powerful capabilities.

These controls are usually less elaborate than filesystem or network controls in a personal setup, but they matter for the same reason: they define what the agent can actually do, not just what files it can see.

In a Docker setup, both rules translate directly to flags on the docker run command:

# Partial example: filesystem and network controls omitted for brevity
docker run --rm -it \
  --cap-drop ALL \
  --security-opt no-new-privileges \
  --user agent:agent \
  my-agent-image
  # Note: /var/run/docker.sock and $SSH_AUTH_SOCK are intentionally not mounted

It is important to realise that a default Docker run is very permissive. It gives any process running in the container the ability to control things on the host. You have to explicitly remove it. --cap-drop ALL drops Linux capabilities that are not needed for code and text tasks, but that a compromised or misbehaving agent could abuse. A few concrete examples of Linux capabilities a default Docker run enables:

CAP_NET_ADMIN would let the agent reconfigure network interfaces or rewrite firewall rules, enough to bypass the egress controls you set up.
CAP_SYS_PTRACE would allow it to attach to other running processes on the host and read their memory, including other agents, your terminal session, and credential helpers.
CAP_SYS_ADMIN is broad enough to allow mounting filesystems or escaping the container namespace entirely under certain conditions.

Logging at Trust Boundaries

All of these controls help because they limit what can go wrong. But they don't tell you what happened. Each control point, the container filesystem, the network egress layer, and the API gateway, is also a place to log: what the agent tried to do, what was allowed, and what was blocked. This serves two purposes.

First, understanding what actually happened. Agents don't always do what you expect, and when the result is wrong or surprising, you want to be able to look back and see what the agent actually tried. Without that, you're left guessing whether it was a bad prompt, a tool misbehaving, or something more concerning.

Second, building confidence over time. When you can see that the agent did exactly what you'd expect across a dozen sessions, you start to trust it more. And when something looks off, like a network request you don't recognise, you notice it instead of missing it entirely.

GitHub's engineering blog on their agentic workflow security is worth reading here. Their architecture instruments every trust boundary: they treat the question "what did the agent try to do at this boundary?" as a first-class logging concern, not an afterthought. The same principle applies to a personal agent, even if the scale is different.

In practice, for a personal setup, this means:

Track filesystem changes during or after the session. In a plain Docker setup, docker diff <container> lists every file the agent added, modified, or deleted, and works while the container is still running. With an explicit OverlayFS mount, you can inspect the upper layer directly at any point to see exactly what changed and decide whether to keep it:# Docker-native: works on a running or stopped container docker diff my-agent-container # OverlayFS: inspect the writable layer directly ls /tmp/agent-overlay/upper
Log every DNS resolution attempt and every outbound connection attempt at the egress layer
Log every API call made through your gateway proxy, including which endpoint and approximately how much data was transferred
Keep these logs somewhere the agent can't reach: a separate volume or a remote sink

Putting everything together

Researching all of this for this article has taught me a lot about agents in general and their approach to security. It also made me realise that all the fuss about OpenClaw also applies to any agent you run on your host. I guess the big difference there is the trust you place in the companies that build those agents versus the person who built OpenClaw. Of course, OpenClaw provides features that other AI agents don't, which inherently make it less secure. But it's still good to be aware that any agent you run has high-privilege access to your host, even though it doesn't seem that way.

For my personal setup, I'm going to use OpenClaw in a container and apply all the controls we've seen, then see how far we get. At least that will give me the confidence that the agent isn't doing things I haven't asked it to do. And with logging in place, I'll also be able to look back at what it actually tried to do, what went through the network, and what files it touched. That visibility is what makes the setup feel manageable rather than just hopeful.

XPRT. Magazine #21

This article is part of the XPRT. Magazine

In this latest edition, we dive into one of the most defining shifts in our industry: the rise of AI as a core part of how we design, build, and operate software.

From practical engineering approaches to strategic insights, this issue brings together hands-on expertise and forward-looking perspectives from Xebia specialists.

Download Full Magazine