Mathijs Boezer
AI system architecture
for product teams
I design and build LLM, agent, and voice AI systems for real products.
Lead AI Engineer at Trengo · Previously at Pandora Intelligence · MSc in AI, TU/e
- Lead AI Engineer at Trengo
- Previously Pandora Intelligence
- MSc in AI, TU/e
- Based in Utrecht, Netherlands
When to Bring Me In
Most teams reach out when one of these things starts happening:
At that point, a focused architecture review or prototype sprint can save months of iteration.
What I Help With
Architecture Reviews
A focused deep dive into your AI system architecture, tradeoffs, and technical risks before scaling.
Prototype Sprints
Fast prototypes to validate AI product ideas, workflows, and feasibility before committing a full team.
AI Platform Design
Multi-tenant AI platforms, model routing, cost controls, evaluation, and observability.
Agent and Voice Systems
LLM workflows, tool use, orchestration, and real-time voice infrastructure for production.
Selected Production Systems I've Designed
A few examples of the kinds of systems I've designed and shipped.
Large-scale agentic voice AI system
Voice-to-voice AI agent handling thousands of calls per day.
Production voice-to-voice AI agent for customer support, designed to operate like a digital employee: handling customer conversations, retrieving operational data, and escalating to human agents when needed.
Unlike traditional voice assistants built around separate STT and TTS components, this system uses a true voice-to-voice model pipeline, allowing the assistant to reason directly over streaming audio while interacting with tools and business systems.
The system operates inside real customer workflows, retrieving data such as order information, account details, and knowledge base content during live conversations.
Architecture
- ·Real-time voice-to-voice model pipeline enabling direct reasoning over streaming audio
- ·Agentic orchestration layer allowing the assistant to call tools and execute workflows during conversations
- ·Predictive tool execution triggered while the user is still speaking to reduce response latency
- ·Built-in human handover capability, allowing the AI to escalate to a live agent when appropriate
Outcome
Deployed in production supporting thousands of calls per day, with sub-second first response latency and seamless escalation to human agents when required.
Agentic pipeline latency optimisation
Reduced response time from ~4 seconds to under 500ms in a production agentic pipeline.
A production AI assistant used a multi-step agentic pipeline to retrieve data, call tools, and generate responses. While the architecture worked functionally, response times were several seconds. A hard blocker for real user-facing deployment.
The core challenge was that most of the latency did not come from the model itself, but from orchestration overhead: provider latency, tool execution, and repeated context preparation.
Architecture
- ·Predictive processing to start context construction and tool preparation before the model call is triggered
- ·Provider-aware model routing selecting the fastest suitable model based on real-time latency signals
- ·Aggressive prompt and context caching to avoid repeated computation
- ·Streaming responses to reduce perceived latency for users
Outcome
Reduced end-to-end response time from ~4 seconds to under 500ms, enabling the system to move from prototype into production deployment.
Slack AI copilot
Internal AI assistant used daily by engineering, support, and product teams to answer technical and product questions from live codebases and documentation.
Engineering and support teams were spending significant time answering the same technical and product questions across Slack channels.
An internal AI assistant was built directly into Slack, able to answer questions by actively exploring the product's codebases and support documentation during each request, rather than relying on a static index.
Architecture
- ·Agent with direct read access to four production codebases, allowing it to explore source code to answer questions
- ·Separate knowledge base for support documentation, queried alongside codebase retrieval
- ·Channel-specific prompt configurations for different audiences: engineering, support, and customer-facing
- ·Full thread context passed on every request for coherent multi-turn conversations
Outcome
Adopted across engineering, support, and product teams and used daily, significantly reducing repeated questions and improving access to internal knowledge.
Explainable company intelligence
AI systems that gather and analyse company information across multiple domains to produce traceable, expert-informed risk assessments.
- ·Multi-source data ingestion and synthesis
- ·Evidence-linked reasoning chains for explainability
- ·Outputs designed for real business decision-making
AI onboarding assistant
Embedded SaaS onboarding assistant that teaches users how to use and configure complex software.
- ·DOM-aware navigation of product interfaces
- ·Screenshot-based fallback for visual context
- ·Low-latency image analysis pipeline
Most of my work is confidential, so I don't publish full case studies or client details publicly. I'm happy to walk through relevant examples and technical decisions in a conversation.
How I Work
Focused Intake
We look at your current product, architecture, constraints, failure modes, and the biggest technical risks.
Review or Prototype
I analyze the system or build a focused prototype to test the assumptions that matter most.
Clear Recommendations
You get concrete guidance on tradeoffs, technical risks, architecture decisions, and what to change next.
Hands-On Support
For select teams, I stay involved through implementation support, technical guidance, or both, especially on the parts of the system that matter most for reliability, latency, and product behavior.
Architecture Review
The fastest way to identify what architecture will work, what will break at scale, and what the fastest path to a real product looks like. A structured deep dive delivered in a week.
Sometimes this starts with an existing system. Other times it's a founder or product team trying to figure out how an AI product should actually be built.
Some teams start with an existing system. Others start with an idea and need help designing the architecture or building the first prototype.
- 45-minute technical intake call
- Written review of your architecture, tradeoffs, and risk areas
- Prioritised recommendations with rationale and next steps
- 45-minute follow-up Q&A session
I've worked with teams building multi-tenant agent platforms, real-time voice pipelines, RAG-heavy products, and teams still figuring out where to start.
After the review, some teams bring me in to prototype or build alongside them. If you have something specific in mind, just reach out and we can figure out if there's a fit.
Experience
Lead AI Engineer
Trengo
Designing and building AI agents and voice infrastructure used by thousands of businesses.
Lead AI Engineer
Pandora Intelligence
Designed LLM-powered systems for regulatory and risk data, from data pipelines to production decision-support systems.
Full-stack Developer
Prodrive Technologies
Software for high-tech embedded and manufacturing systems.
Full-stack Developer
de Jong DUKE
Built software for connected products and manufacturing workflows.
About
I've been designing and building software since I was eleven. What started with game mods and automation projects eventually became a career in software engineering and AI systems.
I studied Artificial Intelligence at Eindhoven University of Technology and have since worked across software engineering, machine learning systems, and production AI architecture.
At Pandora Intelligence, I designed LLM-powered systems for complex regulatory and risk datasets. At Trengo, I work on AI agents and real-time voice systems for large-scale customer workflows.
Independently, I work with a small number of product teams on AI architecture, prototyping, and implementation support. I like staying close to the code, especially on the parts of the system that most affect reliability, latency, and product behavior.
Work With Me
I work with a small number of teams each year on focused AI architecture collaborations.
Most collaborations start with an architecture review or a short prototype sprint. Some extend into hands-on implementation support for the parts of the system that matter most.
If you're building an AI product and want to move faster or avoid expensive mistakes, send a short note about what you're building and where you're stuck.