Roadmap Q4 2026

Security testing for AI-generated code.

53 percent of vibe-coded apps ship with security holes. AI coding agents do not ask "what's the threat model here?" before they wire up an auth flow. tailtest's security layer is on the platform roadmap for Q4 2026.

Honest framing

Security testing is not shipping today. This page describes what we are building and why. If you need security coverage now, established SAST/DAST tools (Semgrep, Snyk, GitHub CodeQL) cover much of this, but with different fit for the AI-coded-app workflow than what we are building.

The vibe-coding security crisis

Researchers and product teams documenting incidents in 2025 and 2026 have surfaced a consistent pattern. Vibe-coded apps ship missing the security layers a human engineer would have included by default:

1.5 million API keys exposed across publicly accessible AI-generated repositories
Unauthenticated endpoints granting access to private enterprise data
Production databases wiped by AI agents explicitly instructed not to
Hard-coded credentials in starter projects shipped to thousands of users
Missing input sanitization in chat interfaces that handle user-controlled prompts

The ICSE 2026 systematic review of 101 sources on vibe coding identified QA as the most frequently overlooked dimension. Security is a subset of QA that has been further overlooked because most testing tools were not designed for the speed at which AI generates code.

Why standard SAST does not fit (alone)

Static analysis security testing tools (Semgrep, Snyk, GitHub CodeQL) catch many real issues. They typically run in CI, against a PR, after the agent already finished writing. The feedback loop is hours-to-days, and by the time it reaches the agent, the context for the original change is gone.

When SAST output goes to a human reviewer, the human can decide whether the finding is real or a false positive. When SAST output goes to an AI coding agent, the agent's response is unpredictable: sometimes it fixes the issue, sometimes it suppresses the warning, sometimes it rewrites the surrounding code in ways that change the threat surface.

tailtest's design intent is to keep security findings in the build loop with the same hook-based deterministic enforcement that drives unit testing today. The agent sees the finding as additional context the same turn it shipped the change. Fix-in-place becomes the default, not "open a Jira ticket for security review."

tailtest's planned approach

Five categories under consideration for the first security layer:

1. OWASP Top 10 alignment

Injection (SQL, command, XPath), broken authentication, sensitive data exposure, XML external entities, broken access control, security misconfiguration, XSS, insecure deserialization, components with known vulnerabilities, insufficient logging.

2. Secrets in code

API keys, tokens, hard-coded credentials, JWT secrets, AWS access keys, GitHub PATs. Detected via pattern matching (TruffleHog-style) AND semantic analysis (e.g., string assigned to a variable named "secret" that is then exported).

3. Authentication / authorization edge cases

Endpoints that should require auth but do not. Authorization checks that compare the wrong identity. Session fixation patterns. Missing CSRF protection on state-changing endpoints. These are the bugs that pass functional tests but fail a threat-model review.

4. Input validation gaps

User input that flows to a database query without parameterization. File paths constructed from user input without sandboxing. URL parameters echoed into HTML without escaping. The plumbing that the agent wrote correctly 9 times out of 10 but skipped on the tenth.

5. CORS, headers, and config security

Overly permissive CORS (allow_origins = ["*"]), missing security headers (CSP, HSTS, X-Frame-Options), debug endpoints accidentally left enabled, environment-specific config (DEBUG=True) shipped to production.

Integration with existing security stack

We do not plan to rewrite what Semgrep, Snyk, Trivy, or GitHub CodeQL already do well. The plan is to wrap those tools in the same hook lifecycle as agent-edit testing so their output reaches the agent mid-turn. Where existing tools have gaps (especially around AI-specific patterns like prompt injection), tailtest adds first-party detectors.

Configuration: pick which detectors you want active, set severity threshold, exclude certain paths. Same .tailtest/config.json pattern that powers agent-edit testing today.

What we are NOT building

Not a security audit service. tailtest is a tool, not a consultancy.
Not penetration testing (different methodology, different skill set, often regulatory).
Not runtime detection (different surface; intentionally adjacent to but not overlapping with tools like Falco, Lacework).
Not policy enforcement (PCI, SOC2, HIPAA compliance are a different scope; tailtest can help generate evidence but is not a compliance platform).

Timeline + how to follow along

Q4 2026 target for first detector category (likely OWASP injection patterns + secrets in code, since those have the clearest design pattern). Iteration in the open via GitHub discussions on avansaber/tailtest. If you work in a regulated industry and security testing is your blocker for adopting AI coding tools, open a discussion, your input shapes priority.

What is shipping today?

Agent-edit testing is live and stable across four hosts.

Read about agent-edit testing Full capability map