Devin
droppedImpressive demo, but I don't trust an agent I can't interrupt mid-thought.
signal: The $500M valuation headlines
The Claim
Devin bills itself as the first AI software engineer. Not a copilot, not an autocomplete, a full autonomous agent that takes a task description and goes off to plan, code, test, and deliver. It gets its own sandboxed environment, a browser, a terminal, and an editor, and it works through problems step by step.
What I Tried
I gave it a real task from my backlog: take an existing API endpoint that returned paginated results and add cursor-based pagination, update the corresponding TypeScript types, and adjust the frontend hook that consumed it. Not trivial, but not a moonshot either. A solid afternoon of work for a human.
I watched it plan. The plan was reasonable. It identified the right files, understood the data flow from API route to frontend hook, and outlined the changes needed. Then it started executing, and that's where things went sideways.
It modified the API route correctly, then got confused about the TypeScript types. It created a new type instead of extending the existing one, which broke the frontend import. It noticed the build error, tried to fix it, created a circular dependency, tried to fix that, and eventually produced something that compiled but didn't match the pattern the rest of the codebase used.
What Surprised Me
The planning phase is genuinely impressive. Devin's ability to read a codebase, identify relevant files, and outline an approach is better than I expected. If this were a junior developer's written plan, I'd approve it.
The execution gap is the problem. There's no way to step in when you see it going wrong. You're watching a replay of decisions you'd have corrected in real time. By the time it finishes, you're reviewing a diff that has three layers of self-correction baked in, and untangling what's intentional from what's a patch on a patch is exhausting.
Who It's For
Teams with dedicated review capacity who can afford to let an agent run, then spend time auditing the output. If you have a tech lead whose job is reviewing PRs anyway, Devin might slot into that flow. Solo developers who need to trust every line they ship should look elsewhere.
Verdict
Dropped. I need to be able to interrupt, redirect, and collaborate with my tools in real time. Devin's autonomous model is fundamentally at odds with how I work. I don't want to submit a task and hope for the best. I want to be in the loop at every decision point. The demo is compelling, but the workflow is wrong for anyone who values control over convenience.