tailtest for Codex CLI¶

tailtest watches every file Codex edits and asks it to write and run tests automatically. As of v4.9.0 it surfaces queued files mid-turn (per-edit) rather than waiting for end of turn. Current release: v4.9.1 (v4.9.1 added a marketplace icon for the Codex composer; no behavioural change).

How the cycle works (v4.9.0+):

The SessionStart hook scans your project for test runners and seeds AGENTS.md with the workflow Codex needs to follow.
The PostToolUse hook fires after every apply_patch (and shell-style commands that write files). It parses the patch envelope to find the modified file, or falls back to an mtime sweep when the payload doesn't surface a path. Qualified source files are queued in .tailtest/session.json and surfaced to the agent as additionalContext mid-turn.
The Stop hook fires at end of turn as a safety net: it sweeps for anything PostToolUse missed (background writes, unparseable patches) and prompts the agent to write tests before continuing.
After tests pass, the normal agent turn resumes.

Requirements¶

Python 3.9+
Codex CLI 0.129.0 or newer (hooks are stable and on by default; older versions need a feature flag, see below)
macOS or Linux

Install¶

# Step 1 (one-time): clone the plugin
git clone https://github.com/avansaber/tailtest-codex ~/.codex/plugins/tailtest

# Step 2 (per project): run the init helper inside each project where you want tailtest
cd <your-project>
bash ~/.codex/plugins/tailtest/scripts/init.sh

# Done. Start a codex session in the project.

The init.sh helper writes .codex/hooks.json in your current directory pointing at the plugin's session_start.py and stop.py. It is idempotent and will not overwrite an existing .codex/hooks.json with different content; it writes a .codex/hooks.json.tailtest sidecar for manual merging instead. You run it once per project; no global registration required.

If you prefer manual setup (skipping the helper script):

mkdir -p <your-project>/.codex
cp ~/.codex/plugins/tailtest/hooks/hooks.json <your-project>/.codex/hooks.json

Marketplace install (alternative one-step path)¶

Starting with v4.8.0 the plugin also ships as a Codex marketplace, so you can register it with one command instead of git clone:

codex plugin marketplace add avansaber/tailtest-codex

Then enable the plugin from inside Codex via the interactive /plugins menu, or by adding the following to ~/.codex/config.toml:

[plugins."tailtest@avansaber-tailtest"]
enabled = true

You still need to run init.sh per project for hooks to fire today, because Codex's plugin_hooks feature (which would let plugins register hooks automatically) is still in development upstream. When that ships stable, the init step goes away. Until then, the marketplace path replaces only the git clone step.

Older Codex CLI versions¶

Codex versions before 0.129.0 shipped hooks behind a feature flag. On those versions add the following to ~/.codex/config.toml once:

[features]
hooks = true

The codex_hooks key (used in older docs) is still accepted as a deprecated alias, but Codex 0.129.0+ emits a deprecation warning on every session start. Rename it to hooks when convenient.

How it works internally¶

SessionStart hook -- scans your project for test runners, detects test style, and injects AGENTS.md so Codex knows the test workflow.

PostToolUse hook (v4.9.0+) -- mid-turn analog of Stop. Fires after every apply_patch (or shell-style tool). Parses the patch payload to find the modified file path, falling back to an mtime sweep since its last fire when the patch envelope can't be parsed. Queues qualified files in pending_files and surfaces them as additionalContext so the agent sees them before the turn boundary.

Stop hook -- safety net at turn end. mtime sweep since turn_start_mtime catches anything PostToolUse missed.

AGENTS.md -- the instruction file that drives the entire cycle: scenario selection, test writing, execution, fix loop, and reporting.

Complexity scoring¶

tailtest scores every queued file before generating scenarios. Path signals (auth, billing, payment, checkout) and content patterns (HTTP calls, database queries, branch count, public functions) contribute to a score. Files scoring 10 or above get thorough-depth testing (10-15 scenarios) regardless of the session-level depth setting, and Codex sees a reasoning note like "billing: +4 billing +3 HTTP = 12 scenarios". Low-complexity files get 2-3 scenarios. This happens automatically on every file write with no configuration needed.

Scenario tracking¶

At turn end, tailtest logs the outcome for each tested file: passed, fixed (failed but resolved within the turn), unresolved, or deferred. This log feeds cross-session history so recurring failures are surfaced at the start of future sessions. See the History page and Advanced for details.

Configuration¶

Create .tailtest/config.json in your project root (optional):

{
  "depth": "standard"
}

See Configuration for all options.

Commands¶

Command	What it does
`/tailtest <file>`	Manually queue a specific file
`/summary`	Print session test results
`/tailtest off`	Pause automatic test generation
`/tailtest on`	Resume after pausing

Troubleshooting¶

No tests after Codex writes a file: Check your Codex version with codex --version. On 0.129.0+ hooks are on by default; on older versions add [features] hooks = true to ~/.codex/config.toml. If you see the message deprecated: [features].codex_hooks is deprecated. Use [features].hooks instead at session start, rename the key as suggested.

Codex seems stuck in a loop: Tailtest uses a stop_hook_active guard. If you see repeated test prompts without progress, verify hooks.json is in your project root, not a subdirectory.

Go/Rust/Java not queuing: These languages require a detected runner. If go.mod, Cargo.toml, or a Maven/Gradle file was not found, files in that language are silently skipped.

Windows: Codex hooks are not supported on Windows.