Our Thoughts on Agentic Architecture: Building a Production-Ready AI Harness

There is a widening chasm in the AI industry. On one side, we have incredible Large Language Models (LLMs) that can reason, code, and analyze. On the other side, we have enterprises struggling to get these models to perform reliable, multi-step business tasks in production. This gap is what we call “Pilot Purgatory.”

The bridge across this chasm is not a better prompt; it is a Production-Grade Agent Harness.

In this deep dive, we outline our architectural philosophy for building agents that don’t just “chat,” but actually work.

The Core Philosophy: The Thin Harness

A common mistake is building a “thick” harness—a system that tries to micromanage the LLM’s reasoning through complex, hardcoded logic. Our approach is different: Build a thin, custom harness that trusts the model for reasoning but enforces deterministic boundaries for execution.

As models like Claude improve, they internalize more planning and reasoning capabilities. A thin harness is designed to shrink over time, allowing the model to do more of the “heavy lifting” while the harness provides the infrastructure for safety, persistence, and tool execution.

The Pillars of Production Architecture

1. The Separation of Concerns: the Claude Agent SDK on AWS

One of the most critical decisions in agentic architecture is how to handle state. We split the world into two:

Interactive Turns (Orchestration): For the “back-and-forth” reasoning of a conversation, we use the Claude Agent SDK. It allows for cyclical reasoning loops and deep memory management, modeling the specialist agent pattern naturally.
Durable Workflows (Execution): For long-running, multi-step business tasks (e.g., “process 500 records and report back in an hour”), we use AWS Step Functions. This provides durable execution, retries, and heartbeats, ensuring a task finishes even if a server restarts.

2. Specialized Agents vs. The “God” Agent

While it’s tempting to build one “Global Assistant,” we find that a Router + Specialized Agents pattern is superior. By splitting an AI into specialized personas (e.g., a Data Analyst agent, a Workflow Operator agent, a Configuration agent), you reduce “context pollution.” An agent with 8 focused tools is significantly more reliable and faster than an agent with 50 tools. A lightweight router call classifies the user’s intent and hands off to the specialist.

3. Tiered Memory Architecture

Memory is what makes an agent feel “intelligent” over time. We implement a three-tier system:

Tier 1: Session Memory: Short-term conversation history, managed via persistent checkpoints.
Tier 2: User Memory: Long-term preferences and patterns (e.g., “User prefers tables over charts”) stored in a vector database (like PostgreSQL with pgvector) and retrieved per-turn via semantic search.
Tier 3: Domain Memory: Static business rules, entity catalogs, and technical schemas provided as “context engineering” in the system prompt.

4. API-First: Agents Consume Endpoints, Not SQL

A production agent should never write raw SQL. Direct database access bypasses validation, permissions, and business logic. Instead, the agent is treated as another user of your existing API layer. Its “tools” are thin wrappers around your REST or GraphQL endpoints. This preserves your security model and ensures that the agent follows the same rules as your web or mobile applications.

Permission and Safety: Layered Defense

In a production environment, you cannot leave safety to the LLM’s “discretion.” Safety must be enforced at the infrastructure level:

Read vs. Write Separation: All read-only tools can auto-execute. All mutation/write tools (deletions, approvals, emails) require an explicit user confirmation via a structured UI gate.
Deterministic Scoping: Tenant and user isolation are injected by the harness code, not the LLM. The agent literally doesn’t have the parameters in its tools to “cross-pollinate” data between different users or organizations.
Content Spotlighting: We use randomized delimiters to wrap user-generated data. This prevents “indirect prompt injection” where a malicious piece of text inside a database record tries to hijack the agent’s instructions.

Conclusion: From Advice to Execution

The market is moving rapidly from “AI advice” (chat) to “AI execution” (agents). To win in this new era, companies must move beyond the “wrapper” mindset and invest in robust, production-grade architectures.

By focusing on thin harnesses, tiered memory, and API-first tool design, we can build agents that deliver measurable ROI and scale alongside the rapidly evolving capabilities of foundation models.

Is your team ready to build a production-grade agent harness? Contact the Nugawi Intelligence team to discuss your architecture.

Our Thoughts on Agentic Architecture: Building a Production-Ready AI Harness

The Core Philosophy: The Thin Harness

The Pillars of Production Architecture

1. The Separation of Concerns: the Claude Agent SDK on AWS

2. Specialized Agents vs. The “God” Agent

3. Tiered Memory Architecture

4. API-First: Agents Consume Endpoints, Not SQL

Permission and Safety: Layered Defense

Conclusion: From Advice to Execution

Related Posts

Les Nostres Reflexions sobre l'Arquitectura Agèntica: Construint un Harness d'IA Llest per a Producció

Nuestras Reflexiones sobre la Arquitectura Agéntica: Construyendo un Harness de IA Listo para Producción

The Master Guide to Agentic AI Client Onboarding: Scaling B2B Operations with Production-Grade Automation

La Guia Mestra per a l'Onboarding de Clients amb IA Agèntica: Escalant Operacions B2B amb Automatització de Grau de Producció