Real projects · real findings
Bugs found. Automatically.
These issues were found by tailtest running alongside a Claude Code session. No test was written manually — tailtest generated the scenarios, ran them, and surfaced what failed.
Personal finance · open source
Recurring transactions permanently lose their date after February
Monthly transactions on the 29th, 30th, or 31st get clamped to Feb 28 when February arrives. On the next advance, the function reads the clamped day instead of the original — the intended date is permanently gone.
GTD productivity · open source
Monthly calendar events set by weekday appear on the wrong date
Events set to repeat on "the first Monday of the month" use a fixed day number instead. A meeting every first Tuesday shows up on the 6th of each month — regardless of what day that is.
Docs drift detector · open source
A file that trips both staleness thresholds is penalised twice
When a file exceeds both the day threshold (≥90 days) and the commit threshold (≥200 commits), checkStaleness returns two separate STALE_FILE errors for the same condition. computeScore deducts 10 points per error, so one stale file silently subtracts 20 points instead of 10.
AI-native code editor · open source
Monthly backup retention uses a fixed 30-day month, deleting backups prematurely
pruneOldBackups calculates the monthly cutoff as monthlyMonths × 30 days. In a 31-day month, a backup from 2 calendar months ago is 61–62 days old — past the 60-day cutoff — and gets deleted even though it should be kept.
Code complexity analysis · Mozilla
Two filter bugs corrupt comment stripping and function-body extraction
MinimalFilter: a single-line """...""" docstring leaves in_docstring=true, so a # comment between two single-line docstrings is incorrectly stripped. AggressiveFilter: lines starting with fn foo() { skip brace counting, causing let bindings after the first body line to leak out. 2 issues filed.
AI survey framework · open source
uniquify() produces duplicate IDs when suffixed names already exist in the list
When a ScenarioList contains a value like "item_1" and uniquify("id") is called on a list that also has duplicate "item" entries, the function generates "item_1" as a suffix — colliding with the pre-existing "item_1". The result has duplicate IDs, breaking the uniqueness contract.
Interactive CLI library
Module-level REDIRECTION_TOKENS list grows without bound on every alias or macro command
Four functions in cmd2.py (_alias_create, _alias_list, _macro_create, _macro_list) assign constants.REDIRECTION_TOKENS by reference and call .extend() on it, permanently mutating the module-level list. Each invocation appends terminators; after 5 alias creates the list grows from 3 to 10 entries. Two Cmd() instances in the same process share the corrupted state — instance A's alias operations affect instance B's redirection parsing.
CLI for Jinja2 rendering
KeyError leaks from get_format() and has_format() when the format string is unknown
get_format(fmt) only catches ModuleNotFoundError; when fmt is not present in the formats dict (e.g. 'nonexistent'), formats[fmt] raises KeyError which propagates uncaught instead of being wrapped as InvalidDataFormat. has_format() inherits the same root cause: it only catches InvalidDataFormat, so the KeyError leaks through and has_format("unknown") raises instead of returning False. Found via V13 native adversarial mode (depth: adversarial) on 2026-04-25.
Type-driven dependency injection
Module discovery crashes on PEP 420 namespace packages and traverses hidden dirs like .git
_find_objects_in_module accesses module.__file__ unconditionally, but PEP 420 namespace packages have no __file__ and crash with AttributeError. Three more robustness issues in the same file: endswith('__init__.py') false-positives on not__init__.py; hidden directories like .git and .mypy_cache are traversed and importlib.import_module is called with invalid dot-prefixed names; one broken submodule kills the entire discovery scan because there is no try/except around the import.
CLIs from type hints
Forward-ref resolution crashes when run from __main__ context
resolve_forward_ref calls __builtins__.copy(). In regular Python modules, __builtins__ is a dict; in __main__ and many embedding contexts it is the builtins module — and modules don't have a copy method. Calling jsonargparse from a __main__ script crashes. Four more edge-case crashes in the same file: bare tuple/set types (no __origin__), and Dict[int, str] with non-numeric or empty-string keys (raw ValueError leak from int()).
OpenAPI for Flask
Single-line docstring descriptions silently vanish from generated OpenAPI specs
get_operation has a backward condition: lines[0] if len(lines) == 0 else '<br/>'.join(lines[1:]). split() always returns at least one element, so len(lines) == 0 is always False — single-line docstrings always go through the join branch and produce an empty string. The description is silently lost. Two more bugs in parse_method: routes registered with HEAD or OPTIONS are silently dropped because the if/elif ladder only handles GET/POST/PUT/PATCH/DELETE.
Self-hosted DOCSIS evidence system
api_restore proceeds with no config_manager and writes to a hardcoded /data path
When get_config_manager() returns None, api_restore does not bail early. It falls through to a fallback that uses '/data' as the data directory and proceeds to extract the user's uploaded archive into a hardcoded /data path on the host filesystem. The sibling api_backup_download endpoint correctly returns 500 in the same condition.
Terminal radio player
Newline in alias name corrupts the alias file with phantom radio entries
add_entry writes name and url directly to the alias file with no sanitization. A name containing a newline (TUI input glitch, clipboard paste, scripted alias creation) splits across two lines and creates a phantom alias on the next read. The search() method also has a state leak: self.found is set to True on a hit but never reset to False on a miss, so any miss following a previous hit reports a stale True.
Full sweep
Every repo we've tested
14 bugs confirmed across 34 repos swept (2 already fixed and merged by maintainers within 24 hours of filing). The clean results are signal too -- well-tested code passes.
| Repo | Result |
|---|---|
| mattrobenolt/jinja2-cli | Bug filed #145 |
| python-cmd2/cmd2 | Bug filed #1649 |
| maldoinc/wireup | Bug filed #135 |
| omni-us/jsonargparse | Bug filed #904 |
| luolingchun/flask-openapi | Bug filed #262 |
| itsDNNS/docsight | Bug filed #357 |
| deep5050/radio-active | Bug filed #150 |
| securo-finance/securo | Bug filed #67 |
| dongdongbh/Mindwtr | Bug filed #380 |
| theDakshJaitly/mex | Bug filed #31 |
| paperclipai/paperclip | Bug filed #3713 |
| rtk-ai/rtk | Bug filed #1322 |
| expectedparrot/edsl | Bug filed #2446 |
| getcompanion-ai/feynman | No findings |
| basicmachines-co/basic-memory | No findings |
| sgcarstrends/backend | No findings |
| berrydev-ai/blockdoc-python | No findings |
| stefanjudis/cchooks | No findings |
| lamoom-ai/lamoom | No findings |
| badlogic/claude-code-tools-python | No findings |
| claudekit-dev/claudekit | No findings |
| rulesync | No findings |
| vibe-log-cli | No findings |
| tdd-guard | No findings |
| cc-flow | No findings |
| claude-hub | No findings |
| tsk | No findings |
| claude-squad | No findings |
| aws-mcp | No findings |
| ccpm | Skipped |
| coffee-analytics | Skipped |
Skipped = no testable source code (skills-only or SQL-only repos)
Get started
Run it on your own project.
Install in 60 seconds. No test files to write, no configuration, no commands.