Blog

AI Software Testing
in practice.

Essays on testing AI-generated code: maturity models, adversarial scenario generation, hook-based enforcement, real bugs found, lessons from the build loop. From the team building tailtest.

Pillar essays

Why Testing AI-Generated Code Is Fundamentally Different

May 6, 2026

Testing human-written code and testing AI-generated code share a name but very little else. Five differences that matter, and what they imply for which testing strategies actually work in 2026.

Read →

The 5 Levels of AI Testing Maturity

April 28, 2026

Most teams shipping with AI coding agents are at Level 1 even when they think they're at Level 3. A maturity ladder for testing AI-built software: from manual catch-up to fully autonomous coverage.

Read →

AI Code QA in CI: Where Your Tests Actually Belong

April 20, 2026

AI code QA does not belong primarily in CI for AI-generated software. The catch-net needs to live inside the build loop, at the agent's edit boundary.

Read →

AI Software Testing
in practice.

Pillar essays

Why Testing AI-Generated Code Is Fundamentally Different

The 5 Levels of AI Testing Maturity

AI Code QA in CI: Where Your Tests Actually Belong

Recent posts

Cursor: run tests automatically after every agent file edit

Fix: [features].codex_hooks is deprecated in Codex CLI

Building dev tools from Pune: distributed teams, timezone math

From 47 OSS repos to 16 real bugs: testing Python with AI

AI test failure classification: real_bug vs test_bug

Inside the Claude Code PostToolUse hook: what fires on edit

AI software testing for non-developers (vibe coders)

Inside the Cline plugin: clinerules plus an MCP server

R15 adversarial mode: 8 edge cases AI agents miss

Inside Codex CLI PostToolUse: what fires on apply_patch

Hook-based testing: enforcing the test cycle outside the LLM

Cursor afterFileEdit hook: why saves and Tab never trigger it

Why we open-sourced tailtest (and why MIT, not BSL)