Fun With AI agents when you need to deploy locally

I’ve been looking for an excuse to get hands on with AI Agents (versus the function calling I’ve played with before ) . However I have concerns about the security issues with this so I’ve been having fun with LLMs, ADK & MCP to create an AI agentic demo that works locally without calling out to the internet.

Why would I do this? Well not everyone can use the public cloud and as readers who have read some of my past posts will know I do a lot of stuff locally . Plus there are many organisations who want the power of agentic AI , but can’t risk leaking confidential data or using the public cloud ( although they can use the sovereign solutions out there) .

I had to come up with an example use case and I came up with a system that extracts tasks from meeting notes and automatically creates tasks based on the tasks it has extracted from the meeting notes. As I am rubbish at naming, I just called it local-agentic .

I’m going to start with discussing the Moving parts in the context of the local caveats . They may not be the official descriptions but I want to get across what the moving parts actually do within the limitations of the environment I am targeting, rather than parroting whatever the current definition for a thing may or may not be today! ( It’s a thing see vibe coding as an example ) I include links later on in this post where you can find more formal definitions

The moving parts

1. AI Agents

Agents are intelligent software entities that can perceive, reason, and act. In this case we are focusing on them doing this in an on-premises or a cloud provider’s sovereign solution like GDC , agents are:

Autonomous: Automate work with minimal human intervention, reducing manual effort and errors.
Modular: Each agent has a clear, auditable role (e.g., task manager, meeting assistant), making the system easy to extend or secure.
Transparent: Local code, local configuration so you know exactly what’s running.

2. MCP Tool Calling

The Model Context Protocol (MCP) provides a standardized, well-documented way for agents to expose and consume tools (APIs). Why is this crucial when deploying locally ?

Open: MCP is open and inspectable.
Granular Control: Ability to limit which tools are exposed, to whom, and when.
Auditability: Ability to track every action.

3. A2A (Agent-to-Agent) Collaboration

A2A protocols let agents exchange information and delegate tasks without needing external orchestrators or cloud services.

Resilience: Agents keep working even if external networks are down.
Integration: Agents can talk to each other and your internal systems without risky firewall exceptions.
Privacy: Sensitive data stays within your trusted zone.

4. LLMs

Using a local LLM via Ollama:

Reduces the chances of data Leakage: No prompts, notes, or results should leave your secure boundary.
Compliance: Helps meet regulatory requirements for restricted data.
Customizable: You control which models are used.

Tool calling support in the LLM is essential: it allows agents to not just provide an NLP interface , but to trigger secure, auditable actions based on that understanding.

Security Considerations with Autonomous AI Agents

The benefits of local multi-agent automation are compelling, however introducing autonomous AI agents into sensitive or regulated environments requires a careful approach to security. Unlike traditional systems, autonomous agents can make decisions, trigger actions, and interact with other agents or systems sometimes without direct human oversight. This opens up new risks that must be evaluated and mitigated.

Key Security Concerns

Unintended Actions: Autonomous agents may take actions that, while logical to the agent, are not authorized or desired from a security or compliance perspective.
Delegation Loops: Without strict controls, agents could delegate tasks recursively, leading to runaway processes or privilege escalation.
Data Leakage If agents interact with external services or unsecured internal components, sensitive information could be exposed.
Auditability: Traditional logging may not capture the intent, rationale, or full context behind an agent’s autonomous decisions.
Spoofing and Impersonation: Without robust authentication, a malicious actor could pose as an agent, injecting harmful actions or data into the system.

There have been a number of articles around the security issues related to agents and how they are implemented . Simon Willison’s post The lethal trifecta for AI agents: private data, untrusted content, and external communication describes succinctly the concerns with MCP. Google published Design Patterns for Securing LLM Agents against Prompt Injections
There are more articles on this buried in the GAI is going well collection.

It’s not all doom & gloom as A2A Protocols can help Address Security Issues

Agent-to-Agent (A2A) protocols, as used in local-agentic, are designed with risks in mind including:

Defined Endpoints and Access Controls: Each agent is configured to communicate only with explicitly listed A2A endpoints, reducing the risk of rogue or unauthorized agent interactions.
Authentication and Authorization: A2A protocols can enforce mutual authentication, ensuring that only trusted agents exchange information or delegate tasks.
Structured, Auditable Communication: All A2A messages are standardized and can be logged for later review, enabling detailed auditing of who delegated what, to whom, and why.
Scoping and Limits: Task delegation and tool invocation can be scoped per agent, preventing accidental or malicious privilege escalation and infinite delegation loops.
Local Containment: All communication takes place within the local controlled environment, with no exposure to external networks unless explicitly permitted.
For a full overview on the approach to security read Enterprise-Ready Features - Agent2Agent (A2A) Protocol

Agent Cards: Transparent, Versioned Configuration

In local-agentic, agents are defined by YAML “agent cards.” This means:

Configuration as Code: Know exactly what each agent does and how it’s wired.
Easy Auditing & Change Management: Review, diff, or roll back agent definitions just like any codebase.
No Surprises: You can verify configurations before deployment.

The agent cards for the agents in the demo are:

Task Manager Agent Card

name: task_manager_agent
description: Task Manager Agent for managing tasks via MCP protocol
version: 1.0.0

# Agent configuration
agent:
  name: TaskManagerAgent
  description: Manages tasks using MCP protocol and A2A communication
  model: mistral:latest
  backend: ollama
  
# MCP server configuration
mcp_servers:
  - name: task_mcp_server
    url: http://localhost:8002
    description: Local MCP server for task management tools

# A2A server configuration  
a2a_servers:
  - name: task_manager_a2a
    url: http://localhost:8001
    description: Task Manager A2A server for inter-agent communication

# Tools and capabilities
tools:
  - name: add_task
    description: Add a new task to the system
    mcp_server: task_mcp_server
  - name: list_tasks
    description: List all tasks in the system
    mcp_server: task_mcp_server
  - name: mark_task_complete
    description: Mark a task as complete
    mcp_server: task_mcp_server
  - name: delete_task
    description: Delete a task from the system
    mcp_server: task_mcp_server

Meeting Assistant Agent Card

name: meeting_assistant_agent
description: Meeting Assistant Agent for processing meeting notes and delegating tasks
version: 1.0.0

# Agent configuration
agent:
  name: MeetingAssistantAgent
  description: Processes meeting notes and delegates tasks via A2A protocol
  model: mistral:latest
  backend: ollama
  
# A2A server configuration for task delegation
a2a_servers:
  - name: task_manager_a2a
    url: http://localhost:8001
    description: Task Manager A2A server for delegating tasks

# Tools and capabilities
tools:
  - name: process_meeting_notes
    description: Process meeting notes and extract action items
    a2a_server: task_manager_a2a
  - name: extract_action_items
    description: Extract action items from text without delegating
    a2a_server: task_manager_a2a
  - name: delegate_task
    description: Delegate a task to the Task Manager
    a2a_server: task_manager_a2a

Issues Encountered: LLMs, Testing, and Troubleshooting

Developing local-agentic was not all giggles though despite or maybe in some cases because of my vibe coding mate! A few of the problems I encountered are listed below . The proliferation of test scripts in the repo explain themselves 🫠

1. LLM Hallucinations and Prompt Engineering

The LLM would sometimes “hallucinate inventing tasks . The lack of grounding led to unreliable results. A detailed & specific prompt was needed to help keep it grounded

prompt for Task manager agent :

    instruction="""You are a task management assistant. You MUST call the appropriate tool function for every user request.

AVAILABLE TOOLS:
- list_tasks_tool() - Returns the current list of tasks
- add_task_tool(description) - Adds a new task
- mark_task_complete_tool(task_id) - Marks a task as complete
- delete_task_tool(task_id) - Deletes a task
- clear_all_tasks_tool() - Deletes all tasks

CRITICAL INSTRUCTIONS:
1. You MUST call the actual tool functions. Do NOT describe what you would do.
2. Do NOT write code examples or describe actions.
3. Do NOT say "I will call" or "I would call" - just CALL the tools directly.
4. When asked to list tasks, call list_tasks_tool() and show the result.
5. When asked to add a task, call add_task_tool(description) with the task description.
6. When asked to mark a task complete, call mark_task_complete_tool(task_id).
7. When asked to delete a task, call delete_task_tool(task_id).

EXAMPLES OF CORRECT BEHAVIOR:
- User: "show my tasks" → Call list_tasks_tool() and display the result
- User: "list tasks" → Call list_tasks_tool() and display the result
- User: "what tasks do I have" → Call list_tasks_tool() and display the result
- User: "add task buy groceries" → Call add_task_tool("buy groceries")
- User: "mark task 1 complete" → Call mark_task_complete_tool("1")

NEVER describe what you would do. ALWAYS call the actual tool function and show the real results.
DO NOT write code examples or describe actions - just execute the tools directly.""",
    tools=[add_task_tool, list_tasks_tool, mark_task_complete_tool, delete_task_tool, clear_all_tasks_tool]

Iterative prompt tuning and providing explicit examples greatly improved extraction accuracy, but it remained a challenge to ensure the LLM didn’t overstep or make things up.

2. Troubleshooting and Validating Tool Calling

Testing tool calling via the ADK Web UI often proved tricky. While it’s great for validating the LLM response quality and prompt effectiveness, it struggled with actual tool invocation flow making troubleshooting harder. Thus we have a suite of test scripts to directly test MCP tool endpoints and A2A flows, ensuring the agent behaviour was asexpected.

3. Model Capability Discovery and Selection

A key early error was not verifying whether the chosen LLM actually supported tool calling. Using Ollama for model management, I quickly discovered that not all local models could handle the required tool invocation patterns. I ultimately selected Mistral because:

It supports tool calling natively . Use ollama show mistral if you already have it downloaded locally or https://ollama.com/search?c=tools to select the model of your choosing.
It performed well on my under-powered laptop, balancing speed, memory, and accuracy

4. Port Conflicts and Environment Management

Running multiple agents and servers locally led to port conflicts, especially when restarting or reconfiguring components. Configuring an `.env` file to centralize and manage all port assignments and environment variables, made it easy to resolve conflicts and share configurations.

Example .env entries:

TASK_MANAGER_MCP_PORT=8002  
TASK_MANAGER_A2A_PORT=8001  
MEETING_ASSISTANT_A2A_PORT=8003

There are however limitations with no external Agent Communication

While on-premises agentic systems like local-agentic offer robust privacy, security, and control, they come with an important limitation: agents are typically restricted to communicating only within the local or private network. This inability to call out to external (cloud-based or third-party) agents has several implications:

No Access to Specialized Cloud Services: Agents cannot leverage leading foundation cloud AI models, SaaS APIs, or external data sources that require outbound internet access.
Reduced Collaboration Cross-organization or ecosystem-wide agent coordination is not possible without explicit, secure bridge, which may be prohibited in high-security environments.
Limited Real-Time Updates: Agents can’t subscribe to or react to real-time events from the broader internet (e.g. pulling live weather feeds).
Manual Integration Required: Any communication with external systems must happen via tightly controlled, auditable gateways or through manual intervention, increasing administrative overhead.

Why Not the Cloud?

Security: Locally means you set the rules e.g no third-party access.
Regulatory Compliance: Essential where there is a requirement for strict data residency or handling requirements.
Future-Proofing: As AI capabilities grow, you can adopt or adapt at your pace, with your own risk controls.

For some use cases, especially in compliance-driven or air-gapped environments, these limitations are non-negotiable advantages. But for others, hybrid or cloud-based agentic systems may offer greater flexibility.

Local vs. the Cloud a comparison

Feature/Aspect	Local Agentic System	Cloud Agentic System
Data Privacy	Full local control, no external exposure	Data may transit or reside outside organization
Compliance	Easier to meet strict, local requirements	Must vet and trust cloud provider’s compliance
Performance	Consistent, LAN-speed, no external latency	Scales elastically, but depends on internet
Integration	Direct, deep integration with local assets	Easier integration with SaaS/cloud APIs
Resilience	Operates offline, not reliant on internet	Dependent on network/cloud uptime
Security	All surface area is within your control	Security shared with/depends on cloud provider
Cost	One-time hardware, ongoing maintenance	Pay-as-you-go, but can scale with usage
Scalability	Limited by local hardware	Near-infinite, easy to scale up/down
LLM/AI Model Choice	Full choice of local models (e.g. Mistral)	Access to latest proprietary models
External Collaboration	Restricted unless explicitly allowed	Easily federate and collaborate externally
Setup & Maintenance	Requires in-house IT expertise	Cloud provider manages much of the stack
Customizability	Fully customizable, even at OS level	Limited by provider’s platform constraints

The demo

When you run local-agentic :

A user (or process) provides meeting notes.
The Meeting Assistant Agent analyzes the notes, extracts tasks, and delegates them to the Task Manager Agent, all using local AI and private protocols.
The Task Manager Agent records, tracks, and manages tasks, accessible through CLI, web UI, or direct API calls.

The repo is here if you want to try it out . It’s a demo so I am not accepting PRs. Fork away and have fun.

Stuff to read

Posted on Jun 29, 2025 at 20:19