Blog
AI Software Testing
in practice.
Essays on testing AI-generated code: maturity models, adversarial scenario generation, hook-based enforcement, real bugs found, lessons from the build loop. From the team building tailtest.
Pillar essays
Why Testing AI-Generated Code Is Fundamentally Different
May 6, 2026Testing human-written code and testing AI-generated code share a name but very little else. Five differences that matter, and what they imply for which testing strategies actually work in 2026.
Read →
The 5 Levels of AI Testing Maturity
April 28, 2026Most teams shipping with AI coding agents are at Level 1 even when they think they're at Level 3. A maturity ladder for testing AI-built software: from manual catch-up to fully autonomous coverage.
Read →
Recent posts
Building dev tools from Pune: distributed teams, timezone math
May 20, 2026Building dev tools from Pune in 2026 with a US co-founder and a distributed team. What we learned about timezones, hiring, and the India dev ecosystem.
From 47 OSS repos to 16 real bugs: testing Python with AI
May 13, 2026We ran tailtest's adversarial test generation against 47 open source Python repos and filed 16 real bugs upstream. Methodology, categories, and what it implies.
AI test failure classification: real_bug vs test_bug
April 13, 2026AI test failure classification routes broken tests by cause: real_bug, test_bug, or environment. Why three labels, the heuristics, and how R12 makes triage work.
Inside the Claude Code PostToolUse hook: what fires on edit
March 30, 2026What the Claude Code PostToolUse hook fires when Claude edits a file: event payload, matchers, exit codes, and how tailtest hooks the lifecycle.
AI software testing for non-developers (vibe coders)
March 17, 2026AI software testing if you do not write code yourself: what the tools do, why your Claude Code app needs them, and how to set one up in one command.
R15 adversarial mode: 8 edge cases AI agents miss
March 4, 2026Adversarial test generation against 47 OSS Python repos found 16 real bugs across 8 categories of edge cases AI agents systematically miss. The full taxonomy.
Hook-based testing: enforcing the test cycle outside the LLM
February 18, 2026Hook-based testing fires the test loop at the agent's event boundary, not from the prompt. Why this jumps test compliance from 70 percent to 100 in practice.
Why we open-sourced tailtest (and why MIT, not BSL)
February 4, 2026An open source AI testing tool only earns trust if you can read the code. Why we picked MIT over BSL, and what we will never gate behind a paid tier.
Subscribe via RSS:
tailtest.com/rss.xml