Mathijs Boezer

AI system architecture
for product teams

I design and build LLM, agent, and voice AI systems for real products.

Lead AI Engineer at Trengo · Previously at Pandora Intelligence · MSc in AI, TU/e

LinkedIn
  • Lead AI Engineer at Trengo
  • Previously Pandora Intelligence
  • MSc in AI, TU/e
  • Based in Utrecht, Netherlands

When to Bring Me In

Most teams reach out when one of these things starts happening:

An AI feature works in demos but becomes unpredictable in production.
Latency or cost starts affecting the user experience.
Agent workflows become harder to reason about and maintain.
A product decision depends on getting the architecture right.
You want to validate an AI product idea quickly before committing months of engineering.
You need to prototype an AI workflow to understand what actually works.

At that point, a focused architecture review or prototype sprint can save months of iteration.

What I Help With

Architecture Reviews

A focused deep dive into your AI system architecture, tradeoffs, and technical risks before scaling.

Prototype Sprints

Fast prototypes to validate AI product ideas, workflows, and feasibility before committing a full team.

AI Platform Design

Multi-tenant AI platforms, model routing, cost controls, evaluation, and observability.

Agent and Voice Systems

LLM workflows, tool use, orchestration, and real-time voice infrastructure for production.

Selected Production Systems I've Designed

A few examples of the kinds of systems I've designed and shipped.

Voice AI1000scalls / day

Large-scale agentic voice AI system

Voice-to-voice AI agent handling thousands of calls per day.

Architecture & details

Production voice-to-voice AI agent for customer support, designed to operate like a digital employee: handling customer conversations, retrieving operational data, and escalating to human agents when needed.

Unlike traditional voice assistants built around separate STT and TTS components, this system uses a true voice-to-voice model pipeline, allowing the assistant to reason directly over streaming audio while interacting with tools and business systems.

The system operates inside real customer workflows, retrieving data such as order information, account details, and knowledge base content during live conversations.

Architecture

  • ·Real-time voice-to-voice model pipeline enabling direct reasoning over streaming audio
  • ·Agentic orchestration layer allowing the assistant to call tools and execute workflows during conversations
  • ·Predictive tool execution triggered while the user is still speaking to reduce response latency
  • ·Built-in human handover capability, allowing the AI to escalate to a live agent when appropriate

Outcome

Deployed in production supporting thousands of calls per day, with sub-second first response latency and seamless escalation to human agents when required.

Latency14s → <500mslatency

Agentic pipeline latency optimisation

Reduced response time from ~4 seconds to under 500ms in a production agentic pipeline.

Architecture & details

A production AI assistant used a multi-step agentic pipeline to retrieve data, call tools, and generate responses. While the architecture worked functionally, response times were several seconds. A hard blocker for real user-facing deployment.

The core challenge was that most of the latency did not come from the model itself, but from orchestration overhead: provider latency, tool execution, and repeated context preparation.

Architecture

  • ·Predictive processing to start context construction and tool preparation before the model call is triggered
  • ·Provider-aware model routing selecting the fastest suitable model based on real-time latency signals
  • ·Aggressive prompt and context caching to avoid repeated computation
  • ·Streaming responses to reduce perceived latency for users

Outcome

Reduced end-to-end response time from ~4 seconds to under 500ms, enabling the system to move from prototype into production deployment.

Internal ToolingDaily usemulti-team

Slack AI copilot

Internal AI assistant used daily by engineering, support, and product teams to answer technical and product questions from live codebases and documentation.

Architecture & details

Engineering and support teams were spending significant time answering the same technical and product questions across Slack channels.

An internal AI assistant was built directly into Slack, able to answer questions by actively exploring the product's codebases and support documentation during each request, rather than relying on a static index.

Architecture

  • ·Agent with direct read access to four production codebases, allowing it to explore source code to answer questions
  • ·Separate knowledge base for support documentation, queried alongside codebase retrieval
  • ·Channel-specific prompt configurations for different audiences: engineering, support, and customer-facing
  • ·Full thread context passed on every request for coherent multi-turn conversations

Outcome

Adopted across engineering, support, and product teams and used daily, significantly reducing repeated questions and improving access to internal knowledge.

Explainable company intelligence

AI systems that gather and analyse company information across multiple domains to produce traceable, expert-informed risk assessments.

  • ·Multi-source data ingestion and synthesis
  • ·Evidence-linked reasoning chains for explainability
  • ·Outputs designed for real business decision-making

AI onboarding assistant

Embedded SaaS onboarding assistant that teaches users how to use and configure complex software.

  • ·DOM-aware navigation of product interfaces
  • ·Screenshot-based fallback for visual context
  • ·Low-latency image analysis pipeline

Most of my work is confidential, so I don't publish full case studies or client details publicly. I'm happy to walk through relevant examples and technical decisions in a conversation.

How I Work

01

Focused Intake

We look at your current product, architecture, constraints, failure modes, and the biggest technical risks.

02

Review or Prototype

I analyze the system or build a focused prototype to test the assumptions that matter most.

03

Clear Recommendations

You get concrete guidance on tradeoffs, technical risks, architecture decisions, and what to change next.

04

Hands-On Support

For select teams, I stay involved through implementation support, technical guidance, or both, especially on the parts of the system that matter most for reliability, latency, and product behavior.

How engagements usually start

Architecture Review

The fastest way to identify what architecture will work, what will break at scale, and what the fastest path to a real product looks like. A structured deep dive delivered in a week.

Sometimes this starts with an existing system. Other times it's a founder or product team trying to figure out how an AI product should actually be built.

Some teams start with an existing system. Others start with an idea and need help designing the architecture or building the first prototype.

  • 45-minute technical intake call
  • Written review of your architecture, tradeoffs, and risk areas
  • Prioritised recommendations with rationale and next steps
  • 45-minute follow-up Q&A session

I've worked with teams building multi-tenant agent platforms, real-time voice pipelines, RAG-heavy products, and teams still figuring out where to start.

After the review, some teams bring me in to prototype or build alongside them. If you have something specific in mind, just reach out and we can figure out if there's a fit.

€2,800
Flat fee · 5–7 business days
Get in touch →

Happy to discuss fit before committing.

Experience

2025 — Present

Lead AI Engineer

Trengo

Designing and building AI agents and voice infrastructure used by thousands of businesses.

2022 — 2025

Lead AI Engineer

Pandora Intelligence

Designed LLM-powered systems for regulatory and risk data, from data pipelines to production decision-support systems.

2021 — 2022

Full-stack Developer

Prodrive Technologies

Software for high-tech embedded and manufacturing systems.

2017 — 2020

Full-stack Developer

de Jong DUKE

Built software for connected products and manufacturing workflows.

About

I've been designing and building software since I was eleven. What started with game mods and automation projects eventually became a career in software engineering and AI systems.

I studied Artificial Intelligence at Eindhoven University of Technology and have since worked across software engineering, machine learning systems, and production AI architecture.

At Pandora Intelligence, I designed LLM-powered systems for complex regulatory and risk datasets. At Trengo, I work on AI agents and real-time voice systems for large-scale customer workflows.

Independently, I work with a small number of product teams on AI architecture, prototyping, and implementation support. I like staying close to the code, especially on the parts of the system that most affect reliability, latency, and product behavior.

Work With Me

I work with a small number of teams each year on focused AI architecture collaborations.

Most collaborations start with an architecture review or a short prototype sprint. Some extend into hands-on implementation support for the parts of the system that matter most.

If you're building an AI product and want to move faster or avoid expensive mistakes, send a short note about what you're building and where you're stuck.