Post Not Found | SOLO

The Experiment

In January 2026, we ran an experiment: let our AI coding agent autonomously generate and ship 200 pull requests across 15 repositories. No human review before opening the PR. Just feedback in, PR out.

The results were instructive. Some PRs were flawless. Others were... educational.

The Setup

Each PR was triggered by a user feedback submission. The pipeline:

We tracked every PR across five dimensions: correctness, code style, scope creep, build pass rate, and merge rate.

The Good: 73% Build Pass Rate

Out of 200 PRs, 146 passed all CI checks on the first attempt. That's better than our internal team average (which hovers around 68% for first-push builds — we checked).

The agent excelled at:

The Bad: Scope Creep Is Real

The agent's biggest weakness was scope discipline. When asked to "fix the signup button," it would sometimes:

We call this "helpful overreach." The agent is trying to improve things, but it's changing code the user didn't ask about.

Solution: We added a scope constraint to the system prompt. The agent now receives explicit boundaries: "Only modify files directly related to the user's feedback. Do not refactor adjacent code."

Scope creep dropped from 34% to 8% after this change.

The Ugly: When It Hallucinates APIs

In 11 cases (5.5%), the agent generated code that called functions that don't exist. It would invent a utility function, import it, and write perfectly reasonable code around it — except the function was imaginary.

This always failed at the build step, which is exactly why build verification is non-negotiable.

Solution: We now inject the repo's actual file tree and export map into the agent's context. Hallucinated imports dropped to under 1%.

What Surprised Us

Speed. The average time from feedback submission to PR open was 87 seconds. A human developer doing the same work would take 30-45 minutes minimum (context switching, reading the feedback, finding the file, writing the fix, running tests, opening the PR).

Consistency. The agent doesn't have bad days. It doesn't skip tests because it's Friday afternoon. It doesn't forget to update the changelog. Every PR follows the same template.

User satisfaction. Users who received agent-generated fixes rated them 4.2/5 on average. The most common complaint? "It fixed the bug but didn't match our coding style." (We've since improved style matching.)

Our Guardrails

After 200 PRs, here's what we consider essential for any AI coding agent:

The Merge Rate

Of the 146 PRs that passed CI, 112 were merged without modification (76.7%). Another 23 were merged with minor edits. Only 11 were closed without merging.

That's an overall merge rate of 67.5% — or 92.5% of CI-passing PRs.

Not bad for a robot.

SOLO's coding agent ships PRs from user feedback in under 2 minutes. Build verification included. Try it free.

One-file fixes — Typo corrections, copy changes, simple bug fixes. 95% pass rate.

CSS/styling changes — Layout adjustments, responsive fixes, color corrections. 89% pass rate.

Adding error handling — Try/catch blocks, null checks, fallback states. 82% pass rate.

Refactor the entire auth flow

Add a new component library

Restructure the file tree

We Let an AI Agent Ship 200 PRs — Here’s What We Learned