Microsoft releases ASSERT, an open-source tool enabling developers to test complex AI agent behaviors using natural language descriptions.
As tech companies rapidly transition from basic chatbots to autonomous “agentic AI,” ensuring these digital workers behave safely and predictably has become a primary bottleneck for software engineers. At its Build 2026 developer conference in San Francisco, Microsoft addressed this problem head-on by launching ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing).
As reported by TechCrunch, ASSERT is an open-source framework that completely changes how engineers evaluate artificial intelligence. Instead of writing thousands of lines of brittle, complex testing code to monitor how an AI agent interacts with users, developers can now spin up comprehensive behavior and regression tests using high-level, natural language descriptions.
Solving the “Agentic AI” Testing Nightmare
Traditional software testing relies on rigid inputs and expected, deterministic outputs; if a user clicks button A, screen B should open. However, generative AI models and autonomous agents are inherently non-deterministic. An AI customer service agent might answer the same inquiry in five different ways, making it incredibly difficult for standard QA (Quality Assurance) software to verify if the model is operating within safe, accurate parameters.
Furthermore, recent 2026 developer data published by TestQuality highlights an emerging crisis in modern software development: AI-generated code contains roughly 1.7 times more defects than human-written code. Because AI writes bugs faster than humans can manually test them, development cycles have hit a wall.
ASSERT fixes this by using AI to test AI. A developer simply inputs a plain-English text description of how an agent should behave, for example: “The customer service agent must always remain polite, refuse to give financial advice, and escalate the chat to a human if the user mentions an account closure.” The ASSERT framework then automatically translates that text specification into structured scoring tests, continuously auditing the agent’s performance during deployment.
Part of a Broader Play for “Stack Control”
ASSERT did not arrive in a vacuum. It is part of a massive, coordinated push by Microsoft to establish full architectural control over the next era of computing. Alongside ASSERT, Microsoft launched the Agent Control Specification (ACS), a new open-source standard SDK bundled with plugins for LangChain, OpenAI, and Anthropic. ACS acts as a unified policy file that travels with an AI agent across different environments to strictly restrict what data it can access.
According to coverage by Techmeme, this double-layer of security and testing works hand-in-hand with Microsoft’s newly announced MXC (Microsoft Execution Containers). MXC acts as a secure, Windows-level sandbox that completely isolates autonomous AI agents while they execute tasks, ensuring that a malfunctioning or compromised agent cannot corrupt the host operating system or leak private corporate data.
Democratizing Quality Assurance
By shifting AI testing away from complex, low-level coding and into the realm of natural language, Microsoft is effectively democratizing the software evaluation process. Product managers, compliance officers, and security teams can now write or update behavioral policies in plain text. The ASSERT framework can immediately generate corresponding test cases to verify compliance, dramatically compressing development timelines.
As tech giants chase greater autonomy for AI systems, the tools required to govern, test, and restrict those systems will prove just as valuable as the underlying models themselves. With ASSERT, Microsoft is giving developers a concrete, accessible way to ensure that the autonomous agents of tomorrow don’t go rogue today.

