A vibe coding interview lets candidates use Cursor, Claude Code, or Copilot during the assessment. You score on verification (do they catch model errors?), prompt quality (how efficient is their AI use?), ownership (can they defend every line?), and orchestration (do they decompose tasks correctly?). You do not score on raw coding speed or algorithm memorization.
What a Vibe Coding Interview Actually Measures
The term "vibe coding" describes a failure mode — accepting whatever the AI generates without reading it, understanding it, or testing it. A strong vibe coding interview tests the opposite: whether the candidate has the discipline to verify, critique, and own AI-generated output.
In 2026, 91% of US engineers use agentic AI tools daily. The interview format that evaluates whether someone can write binary search from memory is evaluating a skill they'll almost never use. The skill they use every day is: read a repo, scope a task, prompt precisely, verify the output, catch errors the model missed, and ship code they can defend line by line.
That's what a vibe coding interview measures.
How It Differs from Traditional Technical Screens
✗ Traditional interview
- AI banned or ignored
- Tests algorithm memorization
- Evaluates typing speed and recall
- Take-home: no way to know who wrote it
- Scores on output quality (correct/incorrect)
- Selects for candidates who practiced LeetCode
✓ Vibe coding interview
- AI tools explicitly allowed and required
- Tests judgment, verification, ownership
- Evaluates prompting quality and efficiency
- Live screen-share: you see how they work
- Scores on process quality (how they got there)
- Selects for candidates who work like your team
The Interview Format (60–90 Minutes)
Setup
- Real repo, not a stub. Give access to a mid-sized real codebase (or a sanitized copy). The task should be the kind your team actually ships and reverts.
- Screen share required. Mandatory. You're evaluating process, not output.
- Any AI tool allowed. Cursor, Claude Code, Copilot, ChatGPT — their choice. Specifying a tool introduces bias and doesn't reflect real work.
- No time pressure on typing speed. The evaluation isn't how fast they type.
Format A — Bug hunt (60 min, best signal per minute)
Plant one or two bugs in the repo — a race condition, a swallowed error in a try/catch, an off-by-one in pagination. The bug should be the type your team actually ships and reverts. Give the candidate 60 minutes. The session has three phases:
- First 10 min: Read the codebase. Strong candidates open
CLAUDE.md,AGENTS.md, or the README before typing a single prompt. Weak candidates start prompting immediately. - Next 35 min: Find and fix the bug using their AI tools. You observe in real time.
- Final 15 min: Defense. Remove the AI. Ask them to explain every line in the diff. Ask why the model chose that approach. Ask what would break this fix.
Format B — PR review (30 min, good secondary signal)
Hand the candidate a 200–300 line PR that an AI generated. Three changes are subtly wrong: a fabricated import, a null check missing, a logic inversion. Can they find all three in 20 minutes? No AI needed for this one — it's pure verification reflex.
Format C — Spec-first build (45 min, strongest prompt quality signal)
Give a small feature spec. Before writing any code, ask the candidate to write their prompt contract — what scope, what constraints, what they'll verify. Score the prompt contract as much as the result. A strong candidate writes a precise brief with edge cases and exit criteria spelled out. A weak candidate writes "build me X."
What to Score and How
Score four dimensions, each 1–5. Weighted total determines hire recommendation.
- Verification (40%): Do they catch model errors? Write tests? Question confidently wrong output? A 5/5 writes a failing test before accepting the AI's fix. A 1/5 ships whatever the model produces without reading it.
- Prompt quality (25%): How many turns to a usable result? Do prompts include codebase context, explicit scope, constraints? A 5/5 gets a working result in 1–2 turns with plan mode. A 1/5 pastes the whole problem as one vague prompt and iterates 10+ times.
- Code ownership (20%): Can they explain every line in the diff without the AI? A 5/5 can rewrite any section manually if asked. A 1/5 says "the AI wrote that part."
- Orchestration (15%): Do they decompose tasks correctly? Know when to use AI vs. handle manually? A 5/5 breaks complex work into atomic subtasks and reviews output at checkpoints. A 1/5 issues one giant prompt and hopes.
Threshold: 4.0+ = strong hire | 3.5–3.9 = conditional | below 3.0 = no-hire. Verification score below 3 = hard no-hire regardless of total.
→ Download the full 5-level scoring rubric with specific examples per score
Red Flags and Green Flags in Real Time
Green flags (what you want to see)
- Opens
CLAUDE.md,AGENTS.md, orREADMEbefore prompting anything - Uses plan mode in Claude Code before any implementation
- Writes a failing test before accepting the AI's fix
- Explicitly rejects a model suggestion and explains why
- Gets to a working result in 2 turns, doesn't iterate endlessly
- Asks "what would break this?" unprompted
- Deliberately chose NOT to use AI for a specific section
Red flags (stop and probe when you see these)
- Pastes the entire problem description as the first prompt without scoping
- Accepts a diff without reading it
- "The AI said it was right" — treating model confidence as fact
- No test written or run after the AI produces output
- 10+ turns for a task that should take 2
- Can't explain any line in the diff under questioning
- Didn't notice a fabricated import in a 50-line diff
Reviewing the Session Transcript After
The AI session transcript is the most underused interview artifact. Claude Code keeps session logs. Cursor Composer shows the full history. Ask the candidate to share it or screen-record the session. Then read it the way you'd read a PR:
- Prompt shape: A few long, precise prompts = good. Many short, vague ones = red flag.
- Turns to first usable output: Fewer is better. More than 5 for a simple task signals unclear thinking.
- Rejection behavior: Zero rejections = accepted everything (red flag). Rejections with clear reasoning = strong signal.
- Plan mode usage: Did they open plan mode before implementing? This single signal separates disciplined from reactive agents.
- Context window management: Did they front-load codebase context, or restart conversations mid-task because the model "forgot" what they were doing?
FAQ
Is "vibe coding interview" the same as an "AI agent interview"?
They're used interchangeably, but there's a nuance. "Vibe coding" originally described a failure mode — accepting AI output without verification. "Vibe coding interview" is the assessment format that tests whether a candidate does this or its opposite. "AI agent interview" is the broader term for any technical assessment where AI tools are allowed and scored. Same format, slightly different framing depending on your audience.
How do you prevent candidates from just letting the AI do everything?
You can't prevent it — and you shouldn't try. If a candidate lets the AI do everything and can't explain any of it under questioning, that IS the signal. The defense phase (15 minutes at the end where you ask them to explain every line) is where AI-over-reliance becomes immediately visible. There's no way to fake understanding of code you didn't write when someone asks "why did the model choose this approach?" and then "rewrite this section without AI."
What seniority level does this format work for?
All levels, with calibration. For L3/L4: check baseline verification discipline and prompt fundamentals. For L5+: evaluate orchestration judgment, architectural decisions made while directing agents, and the ability to scope and decompose large problems. Senior candidates should be able to run parallel agent tasks while reviewing the output of another — and explain every decision made.
Can we run this format without building it ourselves?
Yes — Altor conducts vibe coding and AI agent proficiency interviews on behalf of engineering teams. We run the session, review the transcript, score against the rubric, and deliver a hire/no-hire recommendation with written reasoning within 24 hours.
Altor Runs Vibe Coding Interviews For You
We conduct live AI agent proficiency interviews on behalf of US engineering teams. You get the scored report — we handle the format, rubric, and session review.
Related: Complete AI agent interview guide · Free scoring rubric · Altor's interview service · Karat vs Altor