End-to-end testing for AI-built software.
Per-edit testing catches what each individual file change broke. End-to-end testing catches what the combined changes broke when they meet your user's first click. tailtest's E2E layer is on the platform roadmap for Q4 2026.
Honest framing
E2E testing is not shipping today. This page describes what we are building and why, so you can evaluate the platform direction. If you need E2E coverage now, see the comparison page for tools like TestSprite, Mabl, and Autonoma that ship E2E today.
Why E2E matters more for AI-built apps
When a human writes 100 lines a day, the integration risk between changes is bounded by what they can hold in their head. When an AI coding agent writes 5,000 lines a day across 30 files, the integration risk is not bounded by anything. Each individual change passes its unit tests in isolation. Combined, the user-facing flow breaks in subtle ways.
A signup form that worked yesterday now silently swallows the password field because the agent refactored validation into a different module and forgot to wire the new path. A checkout flow that passed all unit tests fails at the third step because the agent changed an event schema in one service and the consumer service still expects the old shape.
These are the bugs E2E catches. They are also the bugs that survive every other test layer in the AI-coding era because no individual unit test is wrong.
Why current E2E tools do not fit
Existing E2E platforms (Playwright + manually-written specs, Cypress, Selenium) were designed for a world where humans wrote both the application code and the test scripts. They assume the application surface changes slowly and intentionally, so test specs only need maintenance occasionally.
In AI-coded apps the UI surface can shift weekly. By the time a human gets around to updating the E2E spec, the spec is stale, fails for the wrong reason, and gets disabled. The classic "flaky test" problem becomes "the test was never broken; the code under it was reshaped so many times no one knows what it was supposed to assert anymore."
SaaS alternatives (Mabl, Testim, Functionize, TestSprite) address this with self-healing test maintenance. They are better fits than Playwright for vibe-coded apps. But they are SaaS, mostly closed source, and operate post-deployment. tailtest's design intent is to keep E2E in the build loop where the agent can fix what it broke before the diff lands.
tailtest's planned approach
Four design properties we are targeting for the E2E layer:
Agent-authored, not human-authored
The AI coding agent writes the E2E spec the same way it writes unit tests. The user describes the flow once in natural language; the agent maintains the spec as the app evolves. No human-written specs to drift.
Build-loop integrated
E2E runs in the same hook lifecycle as agent-edit testing. When a change touches a flow with E2E coverage, the affected E2E specs re-execute before the turn completes. No separate CI step required.
Open source
Same MIT license as the rest of tailtest. No SaaS account, no per-test pricing, no vendor lock-in. Run the E2E suite on your own infra (or ours if we eventually offer hosting; not in scope today).
Determinism over magic
Self-healing where it makes sense, but every passing test should be deterministically reproducible. Tests that pass "sometimes" are quietly broken; we will surface them, not silence them.
What we are NOT building
To keep scope honest:
- Not a hosted SaaS test platform. Run-it-yourself remains the model.
- Not visual regression specifically (separate /platform/visual/ roadmap item; same hook layer).
- Not load testing or performance testing (separate roadmap items).
- Not a replacement for unit tests; agent-edit testing remains the foundation.
Timeline + how to follow along
Q4 2026 target for a working alpha against a single host (Claude Code first). Iteration in the open: design proposals shipped as GitHub discussions on avansaber/tailtest before any code. If E2E is the layer you need, watch the repo or subscribe to the project newsletter (TBD; planned for Q3 2026).
If you have specific E2E pain you want us to solve first, open a discussion on the repo. Real user pain shapes priorities more than internal roadmaps.
What is shipping today?
Agent-edit testing is live and stable across four hosts.