Security testing for AI-generated code.
53 percent of vibe-coded apps ship with security holes. AI coding agents do not ask "what's the threat model here?" before they wire up an auth flow. tailtest's security layer is on the platform roadmap for Q4 2026.
Honest framing
Security testing is not shipping today. This page describes what we are building and why. If you need security coverage now, established SAST/DAST tools (Semgrep, Snyk, GitHub CodeQL) cover much of this -- but with different fit for the AI-coded-app workflow than what we are building.
The vibe-coding security crisis
Researchers and product teams documenting incidents in 2025 and 2026 have surfaced a consistent pattern. Vibe-coded apps ship missing the security layers a human engineer would have included by default:
- 1.5 million API keys exposed across publicly accessible AI-generated repositories
- Unauthenticated endpoints granting access to private enterprise data
- Production databases wiped by AI agents explicitly instructed not to
- Hard-coded credentials in starter projects shipped to thousands of users
- Missing input sanitization in chat interfaces that handle user-controlled prompts
The ICSE 2026 systematic review of 101 sources on vibe coding identified QA as the most frequently overlooked dimension. Security is a subset of QA that has been further overlooked because most testing tools were not designed for the speed at which AI generates code.
Why standard SAST does not fit (alone)
Static analysis security testing tools (Semgrep, Snyk, GitHub CodeQL) catch many real issues. They typically run in CI, against a PR, after the agent already finished writing. The feedback loop is hours-to-days, and by the time it reaches the agent, the context for the original change is gone.
When SAST output goes to a human reviewer, the human can decide whether the finding is real or a false positive. When SAST output goes to an AI coding agent, the agent's response is unpredictable: sometimes it fixes the issue, sometimes it suppresses the warning, sometimes it rewrites the surrounding code in ways that change the threat surface.
tailtest's design intent is to keep security findings in the build loop with the same hook-based deterministic enforcement that drives unit testing today. The agent sees the finding as additional context the same turn it shipped the change. Fix-in-place becomes the default, not "open a Jira ticket for security review."
tailtest's planned approach
Five categories under consideration for the first security layer:
1. OWASP Top 10 alignment
Injection (SQL, command, XPath), broken authentication, sensitive data exposure, XML external entities, broken access control, security misconfiguration, XSS, insecure deserialization, components with known vulnerabilities, insufficient logging.
2. Secrets in code
API keys, tokens, hard-coded credentials, JWT secrets, AWS access keys, GitHub PATs. Detected via pattern matching (TruffleHog-style) AND semantic analysis (e.g., string assigned to a variable named "secret" that is then exported).
3. Authentication / authorization edge cases
Endpoints that should require auth but do not. Authorization checks that compare the wrong identity. Session fixation patterns. Missing CSRF protection on state-changing endpoints. These are the bugs that pass functional tests but fail a threat-model review.
4. Input validation gaps
User input that flows to a database query without parameterization. File paths constructed from user input without sandboxing. URL parameters echoed into HTML without escaping. The plumbing that the agent wrote correctly 9 times out of 10 but skipped on the tenth.
5. CORS, headers, and config security
Overly permissive CORS (allow_origins = ["*"]), missing security headers (CSP, HSTS, X-Frame-Options), debug endpoints accidentally left enabled, environment-specific config (DEBUG=True) shipped to production.
Integration with existing security stack
We do not plan to rewrite what Semgrep, Snyk, Trivy, or GitHub CodeQL already do well. The plan is to wrap those tools in the same hook lifecycle as agent-edit testing so their output reaches the agent mid-turn. Where existing tools have gaps (especially around AI-specific patterns like prompt injection), tailtest adds first-party detectors.
Configuration: pick which detectors you want active, set severity threshold, exclude certain paths. Same .tailtest/config.json pattern that powers agent-edit testing today.
What we are NOT building
- Not a security audit service. tailtest is a tool, not a consultancy.
- Not penetration testing (different methodology, different skill set, often regulatory).
- Not runtime detection (different surface; intentionally adjacent to but not overlapping with tools like Falco, Lacework).
- Not policy enforcement (PCI, SOC2, HIPAA compliance are a different scope; tailtest can help generate evidence but is not a compliance platform).
Timeline + how to follow along
Q4 2026 target for first detector category (likely OWASP injection patterns + secrets in code, since those have the clearest design pattern). Iteration in the open via GitHub discussions on avansaber/tailtest. If you work in a regulated industry and security testing is your blocker for adopting AI coding tools, open a discussion -- your input shapes priority.
What is shipping today?
Agent-edit testing is live and stable across four hosts.