Slow down to speed up: AI and software engineering

The Pragmatic Engineer 52min 8 min #91
Slow down to speed up: AI and software engineering
Watch on YouTube

Summary

  • Meta’s engineering culture is collapsing under AI-driven mismanagement, culminating in a catastrophic Instagram security breach where attackers could take over any account—including high-profile ones like Barack Obama’s—by simply faking their location via VPN and asking Meta AI to send a verification code to an email they controlled. No second step was needed. This happened because the code that enabled the vulnerability was AI-written, AI-reviewed, and never checked by humans.

    • The root cause wasn’t just AI—it was a toxic combination of token maxing (engineers inflating AI token usage to boost performance metrics, with internal leaderboards awarding titles like “token legend”), layoffs (8,000 people cut in May, with advance notice causing engineers to burn tokens frantically to avoid being flagged as low-usage), and forced reassignments (40% of Instagram’s trust and safety team—a group of ~100 engineers built over 7–8 years, mostly in London—were moved without consent to manual AI data labeling under Alexander Wang’s org, xscalei).
    • After the layoffs and reassignments, most teams are less than half their former size, some with no on-call coverage at all—something that had never happened at Meta before. Morale is at an all-time low, worse than the 2022–2023 layoffs. Engineers who remain feel discarded, with many interviewing elsewhere not because they fear being fired but because they refuse to become manual data labelers. In the US, Meta is even recording employees’ screens to train AI.
    • The chief information security officer resigned mid-investigation, before the SEV (severity incident) review concluded. Meta’s CISO stepping down during an active breach investigation underscores how deep the dysfunction runs.
  • The broader tech industry has undergone a dramatic shift in the past six months, with AI agents going from experimental to genuinely useful. DHH (David Heinemeier Hansson), creator of Ruby on Rails, went from writing all code by hand in summer 2025 to having most of his code written by AI by February 2026. Simon Willison noted that models released in November 2025—specifically Opus 4.6 and GPT 5.4—elevated agents to being genuinely useful.

    • Teams using AI agents at Linear are shipping 5x more code than non-AI teams. Developers using Cursor have nearly 2.5x their annual lines of code (from ~4,000 to ~8,000) and PR sizes are up 3x, meaning roughly 6x more code volume. The percentage of Cursor users accepting AI changes without manual review has surged since Opus 4.7 and GPT 5.5 shipped in January.
  • Leading AI companies have gone all-in on agents, with workflows that would have sounded like science fiction a year ago.

    • At Anthropic, Boris Cherny runs five parallel agents on his laptop, ships 20–30 PRs per day while leading Claude Code, and has replaced PRDs with prototypes. 100% of Claude Code is generated by Claude Code; across Anthropic it’s 70–90%. They built Claude Code’s coworking feature in just 10 days, and it became a massive commercial success—Microsoft tried and failed to replicate it within a month despite Satya Nadella’s deadline.
    • At OpenAI, engineers use an internal “fix it button”: take a screenshot, describe the bug, and Codex generates a PR that even non-engineers can merge (with safety nets). Most devs run multiple agents simultaneously—a common joke is that engineers keep their laptops slightly open in meetings so agents keep running. Codex improves itself overnight by running its own tests and suggesting improvements by morning. Voice notes from debugging sessions are fed to Codex in real time, which returns results by the meeting’s end.
    • At Cursor, the company has pivoted entirely to agents as of January. They built their own coding model (Composer), operate tens of thousands of NVIDIA GPUs leased from Azure and AWS, and are now training models in addition to inference—effectively becoming a mini AI lab. Everyone at Cursor is technical; even developer relations staff migrated Cursor’s entire site to a new CMS using Cursor.
    • At Google, everything is custom: Cider (internal IDE, now a VS Code fork), Jet Ski (internal anti-gravity tool), Critique (code review), Borg (Kubernetes), Monarch (Datadog), Piper (monorepo), and Code Search (inspired SourceGraph). AI is integrated throughout, but adoption lags because Gemini isn’t as capable as Opus or GPT 5.5, and engineers who can use Claude Code do—but only within the Gemini ecosystem.
    • At Meta, the internal tool is Metamate, and “trajectories” log every prompt alongside GitHub commits. When this was made public in December, engineers were exposed—staff engineers asking “can you write me a for loop” were visible to all. Some started writing prompts in Polish to obscure them. Meta’s singular focus is building a state-of-the-art AI model, with Mark Zuckerberg reportedly determined to beat Opus 4.8.
  • Large tech companies are building extensive in-house AI tooling, far beyond what any vendor provides.

    • Uber (3,000 engineers) has a ~20-person developer experience team that has built: an MCP gateway, a no-code agent builder, an agent studio with drag-and-drop, an agent registry used by 20,000 non-engineer employees, an AI CLI (their version of Claude Code), Uber Minion (background agents integrated with their monorepo and experimentation system that also optimizes prompts), a code inbox to triage AI-generated PRs, smart assignments with SLAs, risk profiles for code changes, and U Review (their internal code quality tool).
    • Other companies have similar builds: Stripe (Minions, Tool Shed, Blueprints), Ramp (Inspect, Glass, Dojo, Sensei), Shopify (Sidekick, LM proxy), Airbnb (Catalyst), and more. Cisco rolled out Codex to 18,000 engineers in January. JP Morgan Chase built a multi-agent framework for labeling customer interaction data with eval-based aggregation.
    • Startups are raising significant funding (one raised $70M Series B) by having agents scan entire codebases—in one case finding four critical authentication issues the company didn’t know existed.
  • Several cross-cutting industry trends are emerging, some concerning.

    • Team-level thinking beats individual productivity: Laura Tacho (now leading developer experience at AWS, former CTO at DX) observes that companies seeing real results start with business outcomes (faster deploys, same-quality features) rather than individual speed-ups. Most companies are stuck in the “individuals doing simple automation” quadrant; the goal is “team-level agentic systems,” which requires deep integration work like Uber has done—you can’t buy this off the shelf.
    • Token maxing and tooling addiction: Engineers at Meta, Amazon, Microsoft, and elsewhere inflate token usage to avoid looking unproductive. Internal leaderboards create perverse incentives. Tool pricing creates addictive loops: users upgrade from $10 to $100 to $200 plans, then move to API pricing, feeling pressure to use their allowance. Some engineers report not sleeping well, waking up thinking about agents.
    • Middle management is being cut: Managers are being laid off, reassigned to IC roles, or told to be hands-on. While popular to criticize middle management, good directors and senior managers improve engineering culture—they notice outages, create task teams, and make structural changes. Removing them risks long-term cultural decline.
    • CEOs and CTOs are coding again: Gilio Moranch (founder/CEO of Vercel) reports public company CEOs DMing him excitedly about using Vercel or Claude Code. This creates risk: less middle management to protect engineers, while leadership vibe-codes and assumes things are complete when they’re not.
    • AI costs are exploding: Sam Altman flagged this morning that AI budgets are becoming a “huge issue.” Anthropic turned off API discounts for enterprise customers. GitHub Copilot switched to usage-based pricing on June 1, and users who used to spend $200/month burned through it in three days. Uber’s CTO said they burned through their entire annual AI budget in March; they’ve now capped spending at $1,500/month per engineer, after which they use free models. Some companies cap at $200.
  • Software quality is declining across the industry, with AI amplifying the problem.

    • Anthropic’s own website had a bug for a month where typing in the input field triggered a React lifecycle refresh that erased everything typed. A paid user tweeted about it; the product manager responded as if discovering it for the first time. They don’t dogfood their own product.
    • OpenAI’s Agent Builder was built in six weeks by one engineer using Codex, but quality is terrible—P0 bugs remain unfixed months later, and the forum is full of unresolved complaints. It’s effectively abandonware.
    • Amazon had an AI agent delete and recreate an environment, causing a massive outage. Another AI-generated code bug took down part of Amazon’s flagship website. Amazon now requires senior engineer review of all AI-generated changes because juniors were rubber-stamping.
    • OpenCode (open-source AI coding tool, ~1 million daily active users, 10x growth in 4–5 months): founder Dax Radack says they’re shipping hacks instead of rethinking systems, and their judgment is off. No competitor is beating them by using AI better—they’re winning by slowing down and maintaining quality. He explicitly says they’re telling themselves to use less AI and do more thinking.
    • GitHub had all PRs disappear for 8–12 hours two weeks ago. Their uptime is so poor that third-party trackers estimate some part of GitHub is down 10% of the time. Load has increased 3x over 2 years, which GitHub says they couldn’t have prepared for—a claim the speaker finds unconvincing.
    • Mario Zechner (creator of the libGDX framework that powers OpenCode) says software has become a “brittle mess everywhere”—98% uptime feels normal, UIs have weird bugs everywhere, and while this predates agents, it’s accelerating.
  • Engineers who still care about quality are being buried—a phenomenon the speaker calls “slob” (a play on the SWE slang for low-quality code).

    • Most PRs are now AI-generated. Most developers rubber-stamp them with “LGTM” without real review. The few who do thorough reviews are catching bugs, pushing back on duplicated code, and flagging issues—but they’re overwhelmed, burnt out, and unrewarded. At performance review time, they’re not seen as the ones pushing features. Some quit. OpenCode is actively hiring these burnt-out engineers.
    • Kent Beck summarized it: “We’re accumulating code faster than we accumulate trust.” Code requires trust and understanding, and there’s no time for that now.
    • AI amplifies experience: Seniors gain the most because judgment is rewarded. Hill Wright notes that only TLA+ specification experts can successfully use AI to generate working TLA+ specs—everyone else fails. Juniors can prompt a native iOS app, but it won’t be maintainable if they’ve never built one.
    • Old patterns are coming back: Domain-driven design and verbose guardrails are being used at OpenCode because agents are the new junior engineers and need the same guardrails. The speaker is serious about dusting off design pattern books.
  • Advice for software engineers and engineering leaders on navigating this moment.

    • Slow down to speed up: Cap daily agent usage to what you can review or verify. Peter Schamberger (creator of OpenClaw) ships code he doesn’t read but builds his own verification systems—he thinks in architecture, has AI draw diagrams, and checks modules. Don’t ship more than you can verify.
    • Remove tech debt aggressively: It’s now cheap to remove. Be the “chief tech debt remover” on your team. If you’re not removing it, you’re not using AI efficiently.
    • Experiment with different AI usage patterns: There’s no one-size-fits-all. Michelle Hashimoto (creator of Grit, founder of HashiCorp) always has one background agent doing something—if he’s coding, the agent is planning; if the agent is coding, he’s reviewing. He uses only one agent, not multi-agent setups. Find what works for you.
    • Don’t outsource learning: It’s too easy to let AI fix bugs while your mental model stays broken. Every time you use an agent, learn something. Understand what it built and why.
    • Future-proof your career: The job market is mixed—top tech companies are hiring more, but US/UK software engineering roles are up 20% while Germany and France are down 13% and 10%. AI engineering roles now make up ~10% of all software engineering hires and growing. Build things on top of AI/LLMs: learn RAG, evals, AI engineering. Build a side project (e.g., a podcast recommendation system) or internal tools at work. Read Chip Huyen’s AI Engineering.
    • Become a domain expert: Talk to farmers if you’re at an agriculture company, mechanical engineers if you’re at an automotive company. Domain expertise outside software makes you indispensable.
    • Engineering leaders must stay hands-on: You need to code or you’ll be out. AI makes it easier—use it to explain things, contribute code, and integrate AI into systems. But expect to do less people management; the business demands it. Engineers will get less career support and fewer pay raises for a while.
    • The pace of change is unprecedented—faster than anything since the 1960s. If you’re overwhelmed, that’s normal. Pat yourself on the back for keeping up. Periodically stop and ask: How can I make this more sustainable? How can I produce more quality? Then rinse and repeat.
Back to The Pragmatic Engineer