Claude Code · stable

Claude Code Testing.
PostToolUse hook fires per edit.

The tailtest plugin for Claude Code runs an automated test cycle after every file edit. Tests get written. Tests get run. Failures surface back to Claude within the same turn. You see the green diff, not the broken ones along the way. Hook-based, deterministic, MIT.

Install

$ claude plugin marketplace add avansaber/tailtest

$ claude plugin install tailtest@avansaber-tailtest

# Restart Claude Code. That is the install.

One marketplace command, one install command. No config files. No setup wizard. tailtest registers a PostToolUse hook on installation; the hook fires automatically the next time Claude touches a file in your project.

How tailtest hooks into Claude Code

Claude Code exposes a PostToolUse hook that fires after every tool call (Edit, Write, Bash, etc.). tailtest registers against this hook with a matcher for the file-mutating tools (Edit, Write, MultiEdit, NotebookEdit).

When Claude edits a file, the PostToolUse hook fires within milliseconds. tailtest reads the tool's stdin payload, extracts the file path, runs it through the intelligence filter (skip test files, skip generated code, skip vendored dependencies), checks the language and runner, and queues it in .tailtest/session.json.

Claude sees the queued files as additionalContext mid-turn. The agent writes the test file, runs it with your existing test runner, applies R12 classification to any failures, and reports the result. All inside the same turn. No CI dependency.

Sample session

User: "Add a discount-code validator to checkout.py"

# Claude edits checkout.py, validators.py, types.py

tailtest: queued 3 file(s) (checkout.py, validators.py, types.py)

# Claude writes one test file covering all three, runs pytest

tailtest: 2 scenarios failed

checkout: 10%-off + FREE stacked → total: -$2.40 (expected: $0.00)

checkout: expired code accepted when stacked with valid code

Claude: I see the issue in the discount stacking logic. Two fixes coming up.

# Claude fixes the bugs, reruns, all pass

tailtest: 12 scenarios, all passed.

Why Claude Code specifically

Of the four AI coding hosts tailtest supports (Claude Code, Cursor, Codex CLI, Cline), Claude Code has the tightest hook integration. Anthropic released PostToolUse + Stop + PreToolUse hooks in early 2026 and stabilized them quickly. Hook compliance is 100 percent (vs 70-90 percent for prompt-based instructions per industry research). That makes Claude Code the canonical environment for hook-based AI software testing.

Anthropic's own April 2026 postmortem on Claude Code documented a window where the agent was "faking test compliance", writing tests that pass by working around the broken code instead of catching the bug. Hooks fix that class of issue by enforcing the test cycle outside the LLM's reasoning chain. tailtest is built on this principle.

What you get

Per-edit firing

After every file Claude edits. Not at turn end. Not at CI. Not when you remember.

Production-shaped scenarios

R1-R15 rule layer generates scenarios that map to real user behavior, not synthetic edge cases.

Adversarial mode

8 breakage categories (R15). Found 16 real bugs in 47 OSS Python repos so far.

10 languages

Python, JS, TS, Go, Ruby, Java, Kotlin, C#, PHP, Rust. Runners auto-detected.

R12 failure classification

Each failure tagged real_bug / test_bug / environment so Claude knows what to fix.

Cross-session memory

Recurring failure patterns surface at the start of new sessions. Per-file complexity scores persist.

Real bugs found, real bugs filed

We have used the tailtest plugin in Claude Code to run adversarial test passes against 47 open-source Python repositories. 16 real bugs found and filed with maintainers. Two already fixed and merged within 24 hours of filing.

See all 16 findings

Common questions

Does this work with Sonnet, Opus, and Haiku models?

Yes. tailtest is model-agnostic. It works with whatever model your Claude Code subscription is configured to use.

Will tailtest pollute my project with config files?

A `.tailtest/` directory gets created in your project for session state (`session.json`) and optional configuration. Add `.tailtest/` to your `.gitignore` if you don't want session state committed.

What if I'm already using TestSprite or another testing tool?

They're complementary. TestSprite and similar SaaS tools run post-build (after deploy). tailtest runs in-build (during the edit). Some teams use both: tailtest for the per-edit feedback loop, TestSprite for the end-to-end UI sweep.

Is this safe to run on production code?

tailtest runs your existing test suite. It doesn't introduce new code paths that could affect production. Tests run in your existing test runner with the same isolation guarantees you have today.

Get tailtest into your Claude Code project

Two commands. Restart Claude Code. Build.

View on GitHub Read the docs How agent-edit testing works