Blog
AI Software Testing
in practice.
Essays on testing AI-generated code: maturity models, adversarial scenario generation, hook-based enforcement, real bugs found, lessons from the build loop. From the team building tailtest.
Pillar essays
Why Testing AI-Generated Code Is Fundamentally Different
May 6, 2026Testing human-written code and testing AI-generated code share a name but very little else. Five differences that matter, and what they imply for which testing strategies actually work in 2026.
Read →
The 5 Levels of AI Testing Maturity
April 28, 2026Most teams shipping with AI coding agents are at Level 1 even when they think they're at Level 3. A maturity ladder for testing AI-built software: from manual catch-up to fully autonomous coverage.
Read →
AI Code QA in CI: Where Your Tests Actually Belong
April 20, 2026AI code QA does not belong primarily in CI for AI-generated software. The catch-net needs to live inside the build loop, at the agent's edit boundary.
Read →
Recent posts
Building dev tools from Pune: distributed teams, timezone math
May 20, 2026Building dev tools from Pune in 2026 with a US co-founder and a distributed team. What we learned about timezones, hiring, and the India dev ecosystem.
From 47 OSS repos to 16 real bugs: testing Python with AI
May 13, 2026We ran tailtest's adversarial test generation against 47 open source Python repos and filed 16 real bugs upstream. Methodology, categories, and what it implies.
AI test failure classification: real_bug vs test_bug
April 13, 2026AI test failure classification routes broken tests by cause: real_bug, test_bug, or environment. Why three labels, the heuristics, and how R12 makes triage work.
Inside the Claude Code PostToolUse hook: what fires on edit
March 30, 2026What the Claude Code PostToolUse hook fires when Claude edits a file: event payload, matchers, exit codes, and how tailtest hooks the lifecycle.
AI software testing for non-developers (vibe coders)
March 17, 2026AI software testing if you do not write code yourself: what the tools do, why your Claude Code app needs them, and how to set one up in one command.
Inside the Cline plugin: clinerules plus an MCP server
March 11, 2026Cline has no runtime hook. Tailtest ships a .clinerules baseline plus a 5-tool MCP server, and bridges the gap with Manual and Auto modes. Here is the build.
R15 adversarial mode: 8 edge cases AI agents miss
March 4, 2026Adversarial test generation against 47 OSS Python repos found 16 real bugs across 8 categories of edge cases AI agents systematically miss. The full taxonomy.
Inside Codex CLI PostToolUse: what fires on apply_patch
February 25, 2026The Codex CLI PostToolUse hook fires after apply_patch and shell calls. Event payload, patch parser, additionalContext envelope, and how tailtest hooks it.
Hook-based testing: enforcing the test cycle outside the LLM
February 18, 2026Hook-based testing fires the test loop at the agent's event boundary, not from the prompt. Why this jumps test compliance from 70 percent to 100 in practice.
Inside the Cursor afterFileEdit hook: what fires on save
February 11, 2026The Cursor afterFileEdit hook fires on agent writes, not Tab autocomplete. Event payload, hooks.json shape, exit semantics, and how tailtest hooks it.
Why we open-sourced tailtest (and why MIT, not BSL)
February 4, 2026An open source AI testing tool only earns trust if you can read the code. Why we picked MIT over BSL, and what we will never gate behind a paid tier.
Subscribe via RSS:
tailtest.com/rss.xml