Inside the Cline plugin: clinerules plus an MCP server

The Cline plugin is the odd one out in the tailtest lineup, and it is the post I have had the most fun writing because the architecture is genuinely different from the other three. Cline does not have a runtime hook. There is no PostToolUse, no afterFileEdit, no event that fires when the agent writes a file. The integration surface is two things stitched together: a markdown rule file at .clinerules/01-tailtest-baseline.md and an MCP server exposing five tools. The plugin works because those two pieces, combined with Cline’s auto-approve mode, approximate the boundary that the runtime hooks give us elsewhere.

I am Vaishnavi. I work on the Cline plugin in addition to the Claude Code one, and the contrast between the two is the cleanest case for why architecture decisions in tailtest are agent-shaped, not framework-shaped. This post walks through the actual .clinerules baseline, the five MCP tools, and the Manual versus Auto mode split that ships the gap.

Why Cline is different

The three other agents tailtest supports all expose a runtime event you can hook. Claude Code has PostToolUse. Cursor has afterFileEdit. Codex CLI has PostToolUse plus Stop. The shape is the same in all three: when the agent writes a file, a hook script runs, and the script does the test cycle.

Cline does not have that. The closest thing is a markdown file that the agent reads as system prompt content. Whatever discipline you want the agent to follow has to be written into prose that the agent will, in principle, obey. This sounds like a regression to prompt-based discipline, and it would be, except for two things: Cline’s auto-approve flags give you a way to make the agent’s tool router enforce certain calls, and MCP servers give you a way to make the agent’s “reasoning” call into deterministic code.

Tailtest uses both. The .clinerules file describes the discipline in prose so the agent knows what to do. The MCP server provides the deterministic execution path so the agent does not have to invent the discipline from scratch every turn. Auto-approve makes the loop self-driving when the user opts in. Without auto-approve, the loop is manual: the user invokes the test cycle, the agent follows the rule, the MCP tools run.

The result is structurally weaker than a runtime hook, by design. The user has more control over when the cycle fires. The cost is that compliance is no longer 100 percent unconditionally; it is 100 percent conditional on the user keeping auto-approve enabled, or on the user invoking the cycle manually.

The .clinerules baseline

The rule file lives at .clinerules/01-tailtest-baseline.md in the user’s project. Cline reads everything under .clinerules/ and concatenates it into the system prompt. The numeric prefix controls ordering. Tailtest ships exactly one file because the rule layer should stay small and the user’s own rules should sit alongside it without conflict.

The opening of the baseline file:

# tailtest

You are running with the tailtest plugin for Cline. Your job: automatically run the test cycle the user would otherwise ask for manually. Generate production-like scenarios for what was just built, execute them, and surface only what fails.

**How file detection works in Cline:** Cline has no hook system equivalent to Claude Code's `PostToolUse` or Cursor's `afterFileEdit`. Two surfaces drive the test cycle:

- **Auto mode (opt-in):** when the user has enabled auto-approve for "Edit files (workspace)", "Execute safe commands", and "Use MCP servers", you should run the test cycle automatically after every code edit you make. Treat each edit as if a hook fired.
- **Manual mode (default):** the user invokes the test cycle explicitly via `/tailtest-test <file>` or natural language. When invoked, run the full cycle below.

That paragraph is load-bearing. Cline does not enforce it; the agent has to read it and apply it. In practice this works well because Cline’s tool router does enforce the auto-approve flags. If the user has checked “Use MCP servers” and “Execute safe commands,” the agent can call the MCP tools and run the test runner without per-call human approval. The prose says “do this after every edit”; the auto-approve flags say “yes, you can.” Together they approximate the runtime hook.

The rest of the baseline file is the per-step protocol: identify the edited file, read session state, filter, generate scenarios via the MCP tool, write the test, run it, classify failures, surface results. About 200 lines of markdown that the agent treats as an instruction set.

The five MCP tools

The MCP server lives at mcp_server/src/tailtest_mcp/server.py in the Cline plugin repo. It exposes five tools:

tailtest_ping: trivial reachability check. Returns version and a heartbeat string. Useful for verifying the MCP connection during setup.
tailtest_scenario_plan: given a source file path, returns a structured plan of test scenarios to write. This is the rule layer execution. The plan covers happy-path tests, R15 adversarial categories, and R12 classification hooks.
tailtest_classify_failures: given a pytest output, applies R12’s three-label classification (real_bug, test_bug, environment) and returns the labels. This is the same classifier the Claude Code and Codex plugins use, exposed over MCP for Cline.
tailtest_pick_template: given the source file and project state, picks the right test template (pytest function, pytest class, async pytest, table-driven). The choice depends on the source file’s structure.
tailtest_setup: one-shot project initialization. Writes .tailtest/session.json with runner config, primes the baseline ignore patterns, and verifies that pytest can be invoked.

Each tool is deterministic. The agent can reason in prose about what to test, but when it needs the actual scenario plan or the actual classification, it calls into the MCP tool. The output of the tool is not an LLM call; it is the same Python code that runs in the runtime-hook plugins.

That is the architectural trick. The R-rule logic does not change across agents. The way the agent reaches it changes. Claude Code reaches it through a hook script that runs after the edit. Cline reaches it through an MCP tool call inside the agent’s response. The code at the bottom of the stack is the same.

Manual mode versus Auto mode

The two modes ship the integration gap that the missing runtime hook creates.

Manual mode is the default. The user invokes the test cycle by typing /tailtest-test <file> (a Cline slash command we ship) or by asking in natural language (“test the file I just changed”). The agent reads the rule file, identifies the file, calls tailtest_scenario_plan over MCP, writes the test, runs it via the shell tool, and calls tailtest_classify_failures on the output. Manual mode gives the user full control over when the cycle fires. It also means that if the user does not invoke it, the cycle does not fire. Compliance is whatever the user does.

Auto mode is opt-in via Cline’s auto-approve flags. When the user enables auto-approve for “Edit files (workspace),” “Execute safe commands,” and “Use MCP servers,” the agent can run the full cycle after every edit without asking for confirmation. The .clinerules baseline tells the agent to treat each edit as if a hook fired. In practice, with auto-approve on, the loop is self-driving: the agent edits, then calls the MCP tools, then runs the test, then reports failures, then loops if there is something to fix.

Auto mode approximates the runtime-hook behavior of the other three agents. The approximation is not perfect. The agent can in principle skip the cycle even with the rule in place; compliance depends on the agent obeying its system prompt. Across 1,400 Auto-mode edits in March and April, the cycle fired in 96 percent of cases. Below the 100 percent runtime hooks give us, well above the 70 percent baseline of pure prompt-based discipline in hook-based testing explained.

The 4 percent miss rate is dominated by very long edits where the agent ran out of context budget before invoking the cycle. The misses are systematic and we are tightening the rule prose to address them.

Why the split is the right shape

There was a real design argument inside the team about whether the Cline integration should try harder to mimic the runtime-hook contract. Manual-only would mean no testing during long autonomous runs. Auto mode with 96 percent compliance is meaningfully better than that. Users who want runtime-hook semantics can use Claude Code, Cursor, or Codex CLI. Users who pick Cline for the long autonomous loop get the best approximation we know how to ship.

How the pieces wire together

A walkthrough of one Auto-mode edit:

User has auto-approve on for edit, exec, and MCP.
Cline writes src/pricing.py via write_to_file. No hook fires.
The agent, having read the .clinerules baseline, calls tailtest_scenario_plan with the file path.
The MCP server runs the rule layer: detects language and structure, picks scenarios per R15’s eight categories (boundary inputs, format and injection, type confusion, concurrent state, time and locale edges, partial failures, resource exhaustion, off-by-one), returns a structured plan.
The agent calls tailtest_pick_template, writes the test file.
The agent runs pytest tests/test_pricing.py via shell.
The agent calls tailtest_classify_failures on the pytest output. R12’s three-label classifier returns the labels.
On real_bug, the agent fixes the source and loops. On test_bug or environment, it fixes the test or the environment without touching the source.

The MCP tools handle the deterministic parts. The agent handles the reasoning. The split pins the rule layer where it belongs.

Where Cline sits in the four-agent picture

Tailtest currently ships four plugins: Claude Code, Cursor, Codex CLI, and Cline. The first three use runtime hooks (PostToolUse, afterFileEdit, PostToolUse respectively). Cline uses .clinerules plus MCP. All four share the R1-R15 rule layer, the R12 classifier, and the 8-category R15 adversarial pass. They have caught 17 real bugs across 55 open-source Python repositories so far, full list at the case studies page. The 1,234 tests across the four plugins enforce that the shared code stays shared.

The architectural argument for hook-based testing is in hook-based testing explained. The argument for why per-edit verification matters at all is in why testing AI-generated code is different. The per-edit testing capability page is at agent edits. The Claude Code deep-dive is the closest analog to this post, at inside the PostToolUse hook; reading the two side by side is the cleanest way to see how runtime hooks and MCP-based integration differ.

How to install

Tailtest’s Cline plugin is MIT licensed, ships no telemetry, and requires no SaaS account. The installer is uvx tailtest install --agent cline. It writes .clinerules/01-tailtest-baseline.md to your project, registers the MCP server in your Cline MCP config, and writes a minimal .tailtest/config.yaml. The Cline solution page walks through the full integration. The plugin docs are at /docs/cline/.

If you want to compare against the runtime-hook plugins before installing, the Cursor deep-dive and the Codex CLI deep-dive cover the alternatives. They solve the same problem with stronger compliance guarantees because the underlying agent gives them a runtime event. Cline is the agent for users who want the agent to drive longer autonomous runs, and Auto mode is how we make testing keep up.

FAQ

Does Cline have a runtime hook like Claude Code’s PostToolUse?

No. Cline does not expose a runtime event when the agent writes a file. Tailtest’s Cline plugin uses .clinerules markdown plus an MCP server to approximate the contract, with Auto mode driven by Cline’s auto-approve flags.

What are the five MCP tools the Cline plugin exposes?

tailtest_ping, tailtest_scenario_plan, tailtest_classify_failures, tailtest_pick_template, and tailtest_setup. Each is deterministic Python code, shared with the runtime-hook plugins where applicable.

What is the difference between Manual mode and Auto mode?

Manual mode is the default: the user invokes the test cycle explicitly. Auto mode requires the user to enable Cline’s auto-approve flags for edit, exec, and MCP; once on, the agent runs the cycle after every edit per the .clinerules instruction.

What is the observed compliance in Auto mode?

96 percent across 1,400 Auto-mode edits in March and April. The 4 percent miss rate is dominated by very long edits and by files the agent did not consider testable.

Why not enforce the test cycle at the tool router level?

Cline’s tool router enforces auto-approve flags, not specific call sequences. The router will let the agent call MCP tools or shell commands but does not require any particular order. The .clinerules prose is what tells the agent the order to use.