AI has flipped code review on its head. Code is no longer scarce—it’s abundant, fast, and cheap to generate. But abundance cuts both ways.

Every new line of AI-generated code is both an asset and a liability. More code ≠ more value. Velocity is up. Risk is up too. The question is no longer “Can we review this?” but “Can we review this at scale without drowning?”

“AI didn’t remove code review—it multiplied its surface area.”

Gergely Orosz

Before: The Pre-AI Code Review World

  • Human-written code, human-paced reviews

  • Line-by-line scrutiny: syntax, style, correctness

  • Smaller PRs, slower merges

  • Reviewers acted as gatekeepers

  • Quality relied heavily on reviewer bandwidth and experience

Now: Code Review in the Age of AI

  • Exploding code volume: AI generates days of code in hours

  • Reviews shift from syntaxintent (spec-driven), architecture, outcomes

  • Hybrid workflows:

    • AI: linting, duplication, security scans, test generation

    • Humans: context, tradeoffs, business logic

  • PRs merge faster—but are harder to fully understand

New Challenges

  • AI slop: verbose, duplicated, low-signal code, less readable (consitency)

  • Subtle bugs & hallucinations that “look right”

  • Security issues and policy violations

  • Inconsistent org standards & missing workflows

  • Reviewer fatigue—debugging AI output instead of reviewing ideas

AI code creates 1.7x more problems

3P Framework to evolve Code Reviews

Prepare (context)

In the age of AI-generated code, code review quality is determined before the first line of code is written. If you don’t define context, the AI will. And that’s where entropy begins.

“Prepare” means encoding your team’s engineering constitution directly into the repository so both humans and AI generate and review code against the same rules.

Refer to my previous article on context engineering (spec-driven development) to learn more.

  • Specifications & Constitutions

    • ADRs, or design docs that define what should be built and why

    • Explicit non-goals, invariants, performance and security constraints

  • Behavioral Instructions for AI Tools

    • GitHub Copilot instructions

    • Cursor rules

    • Cluad.md

    • Repo-level prompts that encode architecture, layering, naming, and testing expectations

  • Reviewable Standards

    • Architecture principles, patterns, and anti-patterns

    • Security, compliance, and ticket-linking requirements

    • Checklists reviewers can point to when code deviates

The key shift: reviews are no longer opinion-based—they’re contract-based. Reviewers aren’t debating style; they’re verifying compliance with the repo’s constitution.

This only works if teams use these tools consistently. Partial adoption creates blind spots. Full adoption creates leverage: higher-quality AI output, faster reviews, and less cognitive load on senior engineers.

In AI-assisted development, context is the new code quality gate.

Practice (adopt)

Context files set the foundation. Now reviewers need to use them. The practice phase is where teams adopt spec-driven coding and manual review workflows that leverage AI assistance without fully automating away human judgment.

Spec-driven coding creates reviewable artifacts before implementation.

Following the Research → Plan → Implement (RPI) pattern, developers document what exists (research), design the change with clear phases (plan), then execute with verification (implement).

The plan becomes the primary artifact reviewers evaluate (Review focuses on Specs/intent and less on code) in a fully AI Native teams.

When a PR includes a spec that outlines architectural decisions, acceptance criteria, and implementation steps, reviewers validate against intent—not just syntax. GitHub Copilot's workflow formalizes this: curate project-wide context in ARCHITECTURE.md and .github/copilot-instructions.md, generate an implementation plan, then produce code. The plan is what gets reviewed first.

Reviewers use AI as a research assistant, not a decision-maker.

During manual review, paste the diff into Claude or GPT-4 with explicit questions: "Does this follow our authentication pattern from the instructions file?" "Are there edge cases we're missing?" "How does this interact with the shared-utils library?" The AI returns targeted analysis grounded in the context you provide, but the reviewer makes the call. This is Addy Osmani's "trust but verify" approach—AI accelerates investigation, humans own accountability.

For solo developers, practice means building test coverage as the safety net. For teams, it means establishing a shared language around "what does good look like" using the context files as the reference standard. Both approaches maintain human judgment while using AI to handle the mechanical investigation work.

Productize (automate)

Productize is where AI-native code review stops being a set of cool experiments and becomes a repeatable safety net wired into your delivery pipeline. Think of it as turning your “Prepare” rules and “Practice” habits into enforceable, observable checks that run every time code moves toward production

What to automate in CI

Modern AI-heavy teams should treat CI as the first serious reviewer of every PR, especially when agents are generating large features rather than tiny snippets. Useful automations include:[ppl-ai-file-upload.s3.amazonaws]​

  • Policy and standards gates (lint, formatting, architecture rules, “constitution” checks against SpecIt/claude.md/Copilot instructions) that fail fast when the repo contract is violated.

  • Security and compliance scans that look for hard-coded secrets, dependency risk, hallucinated packages, and ticket / Jira alignment before humans ever open the PR.

  • Agentic review passes (Qodo, CodeRabbit, Copilot, Cursor Bugbot) that generate structured comments, suggested patches, and risk scores directly on the diff.​

Shift-left into the developer workflow

The same checks should run earlier, inside the editor or pre-push hooks, so developers get feedback while the AI is still “in the loop.”​

  • IDE integrations (Qodo, CodeRabbit, Copilot, Cursor) can run localized reviews, point out contract violations against your spec files, and propose one-click fixes before the branch ever hits GitHub.

  • Lightweight pre-PR pipelines (lint + tests + security + constitution check) keep noisy or non-compliant AI changes from polluting the main CI queue.

  • For high-risk repos, gated commands like “/qodo review” or “/c-rabbit deep-review” can be required before reviewers are assigned, so humans start with a triaged PR

Pros and cons of key automation tools

Tool

Where it shines in CI

Pros for Productize

Cons / gotchas

Qodo

Deep, agentic multi-repo review in CI; ticket-aware checks and risk scoring.

Strong governance, cross-repo context, suggested patches; great for large orgs and compliance-heavy teams.

Needs upfront indexing and works best when teams are ready to standardize on its workflows.

GitHub Copilot

Fast file-level security and correctness hints on PRs and in the IDE.

Minimal setup, lives where GitHub-native teams already work; great “first pass” on AI code smells.

Limited to diff-level context, no strong policy enforcement or enterprise controls.

Cursor (Bugbot)

Inline CI-style comments on obvious breakages in the diff.

Tight editor–PR loop; very low ceremony for catching regressions early.

No system-level or compliance view; not suited as the only gate in complex estates.

CodeRabbit

Agentic code validation that blends static analysis, security tools, and reasoning models per PR.

Good balance of automation and human-in-the-loop; explains why an issue matters and suggests concrete fixes.

Still primarily diff-scoped; enterprises may need extra layers for governance/auditability.

Graphite

Stack-aware merge gating in CI for chopped-up AI changes.

Makes incremental, stacked PR workflows practical, reducing reviewer overload.

Focuses on workflow structure, not deep code analysis; needs other tools for quality and security.

A healthy Productize stage wires these tools so that “constitution” violations are caught automatically, risky AI output is filtered before humans spend time on it, and reviewers can focus their scarce attention on intent, architecture, and customer impact.​

While powerful, these tools can trigger false positives and alert fatigue if mismanaged. Lacking deep business context, they often focus on the how rather than the why.

The most effective strategy remains a hybrid approach: let AI handle the initial sweep, while humans provide the final, context-aware validation.

Putting It All Together: Closing Thoughts and Best Practices

Treat AI-native code review as a team sport with a constitution, not a pile of clever prompts. The goal of these best practices is to keep velocity high without quietly normalizing flaky, half-understood AI changes.

Keep the constitution alive

  • Revisit your repo specs (specit files, claude.md, Copilot instruction.md, Cursor rules) on a cadence: when the architecture shifts, the constitution must shift with it.

  • Make it someone’s explicit job each cycle to prune dead rules, add new examples, and broadcast changes in release notes or Slack so people and agents are operating on the same mental model.

Normalize showing your work

  • Ask contributors to mention which agents and prompts produced key chunks of code, and which spec files they relied on, in the PR description.

  • During review, treat “no explanation, huge diff” as a smell; prefer smaller, story-shaped PRs where the reasoning, tests, and acceptance criteria are visible.

Make AI tools boring and consistent

  • Standardize on a small, blessed set of tools for review and CI so signal is predictable: one main agentic reviewer, one security scanner, one quality gate, all wired into the same stages.

  • Encourage developers to run the same checks locally that CI will run later; the fewer surprises after push, the more reviewers can focus on intent and design instead of churn.

Guardrails over heroics

  • Default to safe patterns: feature flags for risky AI-generated changes, progressive rollout, and explicit rollback plans baked into PR templates.

  • In incident reviews, look specifically for where AI-assisted code or missing guardrails contributed, then update both the constitution and the CI gates so the same failure mode is harder next time.

Culture: curiosity, not blame

  • Encourage reviewers to ask “what did the model assume here?” instead of “who messed this up?”, and use weird AI behavior as learning material in team brownbags or docs.

  • Celebrate PRs where authors used the tools well: tight specs, focused prompts, clean diffs, and CI passing on the first try; that is the behavior the system should reward.

If the team keeps the context fresh, runs the same checks everywhere, and treats AI as a sharp but fallible collaborator, code review in the age of agents becomes less about catching random landmines and more about steering the system toward the software the team actually meant to build

Sanjai Ganesh Jeenagiri

Keep Reading

No posts found