What is a vibe coding interview?

A vibe coding interview is a technical assessment where candidates use AI tools (Cursor, Claude Code, GitHub Copilot) during the interview itself. The interviewer evaluates engineering judgment — how the candidate prompts, verifies, and owns AI-generated code — rather than whether they can write code from memory. The term comes from 'vibe coding,' which describes accepting AI output without verification. A good vibe coding interview specifically tests whether the candidate does NOT vibe code — whether they verify, critique, and own what the AI generates.

How is a vibe coding interview different from a traditional technical interview?

Traditional technical interviews ban AI and test whether candidates can write correct code from memory. Vibe coding interviews allow AI tools and test whether candidates can steer AI toward a correct result, verify the output, catch model errors, and own every line they're about to merge. The evaluation shifts from 'can they write code?' to 'can they produce trustworthy code using AI?' which is the actual job in 2026.

What do you score in a vibe coding interview?

Score four dimensions: (1) Verification — does the candidate catch model errors, write tests, question wrong output? (2) Prompt quality — how many turns to a usable result, how precise are the prompts? (3) Code ownership — can they explain every line in the diff without the AI? (4) Orchestration — do they decompose tasks correctly and use AI at the right moments? Verification carries the most weight (40%) because it's the safety reflex that prevents shipping hallucinated code.

Should you allow AI in technical interviews?

Yes, for most engineering roles in 2026. 91% of US engineers use agentic AI tools daily. Banning them tests a fictional version of the job. Meta, Google, Canva, Shopify, and Rippling already allow and evaluate AI tool usage in at least one interview round. The evaluation shifts from 'can they write code?' to 'can they steer AI toward a trustworthy result?'

How to Run a Vibe Coding Interview (Hiring Manager's Guide, 2026)

What This Is

A vibe coding interview lets candidates use Cursor, Claude Code, or Copilot during the assessment. You score on verification (do they catch model errors?), prompt quality (how efficient is their AI use?), ownership (can they defend every line?), and orchestration (do they decompose tasks correctly?). You do not score on raw coding speed or algorithm memorization.

What a Vibe Coding Interview Actually Measures

The term "vibe coding" describes a failure mode — accepting whatever the AI generates without reading it, understanding it, or testing it. A strong vibe coding interview tests the opposite: whether the candidate has the discipline to verify, critique, and own AI-generated output.

In 2026, 91% of US engineers use agentic AI tools daily. The interview format that evaluates whether someone can write binary search from memory is evaluating a skill they'll almost never use. The skill they use every day is: read a repo, scope a task, prompt precisely, verify the output, catch errors the model missed, and ship code they can defend line by line.

That's what a vibe coding interview measures.

How It Differs from Traditional Technical Screens

✗ Traditional interview

AI banned or ignored
Tests algorithm memorization
Evaluates typing speed and recall
Take-home: no way to know who wrote it
Scores on output quality (correct/incorrect)
Selects for candidates who practiced LeetCode

✓ Vibe coding interview

AI tools explicitly allowed and required
Tests judgment, verification, ownership
Evaluates prompting quality and efficiency
Live screen-share: you see how they work
Scores on process quality (how they got there)
Selects for candidates who work like your team

The take-home collapse: Roughly 45% of US employers still send take-home assessments, but trust has broken down — there's no reliable way to know whether the candidate wrote the code or handed the ticket to Claude Code at midnight. Live vibe coding with a screen share solves this: you see the process, not just the artifact.

The Interview Format (60–90 Minutes)

Setup

Real repo, not a stub. Give access to a mid-sized real codebase (or a sanitized copy). The task should be the kind your team actually ships and reverts.
Screen share required. Mandatory. You're evaluating process, not output.
Any AI tool allowed. Cursor, Claude Code, Copilot, ChatGPT — their choice. Specifying a tool introduces bias and doesn't reflect real work.
No time pressure on typing speed. The evaluation isn't how fast they type.

Format A — Bug hunt (60 min, best signal per minute)

Plant one or two bugs in the repo — a race condition, a swallowed error in a try/catch, an off-by-one in pagination. The bug should be the type your team actually ships and reverts. Give the candidate 60 minutes. The session has three phases:

First 10 min: Read the codebase. Strong candidates open CLAUDE.md, AGENTS.md, or the README before typing a single prompt. Weak candidates start prompting immediately.
Next 35 min: Find and fix the bug using their AI tools. You observe in real time.
Final 15 min: Defense. Remove the AI. Ask them to explain every line in the diff. Ask why the model chose that approach. Ask what would break this fix.

Format B — PR review (30 min, good secondary signal)

Hand the candidate a 200–300 line PR that an AI generated. Three changes are subtly wrong: a fabricated import, a null check missing, a logic inversion. Can they find all three in 20 minutes? No AI needed for this one — it's pure verification reflex.

Format C — Spec-first build (45 min, strongest prompt quality signal)

Give a small feature spec. Before writing any code, ask the candidate to write their prompt contract — what scope, what constraints, what they'll verify. Score the prompt contract as much as the result. A strong candidate writes a precise brief with edge cases and exit criteria spelled out. A weak candidate writes "build me X."

Best combination: Format A (60 min) + 15-minute transcript review together. You get signal on both generation and verification — the two halves of AI-native engineering — in 75 minutes total.

What to Score and How

Score four dimensions, each 1–5. Weighted total determines hire recommendation.

Verification (40%): Do they catch model errors? Write tests? Question confidently wrong output? A 5/5 writes a failing test before accepting the AI's fix. A 1/5 ships whatever the model produces without reading it.
Prompt quality (25%): How many turns to a usable result? Do prompts include codebase context, explicit scope, constraints? A 5/5 gets a working result in 1–2 turns with plan mode. A 1/5 pastes the whole problem as one vague prompt and iterates 10+ times.
Code ownership (20%): Can they explain every line in the diff without the AI? A 5/5 can rewrite any section manually if asked. A 1/5 says "the AI wrote that part."
Orchestration (15%): Do they decompose tasks correctly? Know when to use AI vs. handle manually? A 5/5 breaks complex work into atomic subtasks and reviews output at checkpoints. A 1/5 issues one giant prompt and hopes.

Threshold: 4.0+ = strong hire | 3.5–3.9 = conditional | below 3.0 = no-hire. Verification score below 3 = hard no-hire regardless of total.

→ Download the full 5-level scoring rubric with specific examples per score

Red Flags and Green Flags in Real Time

Green flags (what you want to see)

Opens CLAUDE.md, AGENTS.md, or README before prompting anything
Uses plan mode in Claude Code before any implementation
Writes a failing test before accepting the AI's fix
Explicitly rejects a model suggestion and explains why
Gets to a working result in 2 turns, doesn't iterate endlessly
Asks "what would break this?" unprompted
Deliberately chose NOT to use AI for a specific section

Red flags (stop and probe when you see these)

Pastes the entire problem description as the first prompt without scoping
Accepts a diff without reading it
"The AI said it was right" — treating model confidence as fact
No test written or run after the AI produces output
10+ turns for a task that should take 2
Can't explain any line in the diff under questioning
Didn't notice a fabricated import in a 50-line diff

Reviewing the Session Transcript After

The AI session transcript is the most underused interview artifact. Claude Code keeps session logs. Cursor Composer shows the full history. Ask the candidate to share it or screen-record the session. Then read it the way you'd read a PR:

Prompt shape: A few long, precise prompts = good. Many short, vague ones = red flag.
Turns to first usable output: Fewer is better. More than 5 for a simple task signals unclear thinking.
Rejection behavior: Zero rejections = accepted everything (red flag). Rejections with clear reasoning = strong signal.
Plan mode usage: Did they open plan mode before implementing? This single signal separates disciplined from reactive agents.
Context window management: Did they front-load codebase context, or restart conversations mid-task because the model "forgot" what they were doing?

FAQ

Is "vibe coding interview" the same as an "AI agent interview"?

They're used interchangeably, but there's a nuance. "Vibe coding" originally described a failure mode — accepting AI output without verification. "Vibe coding interview" is the assessment format that tests whether a candidate does this or its opposite. "AI agent interview" is the broader term for any technical assessment where AI tools are allowed and scored. Same format, slightly different framing depending on your audience.

How do you prevent candidates from just letting the AI do everything?

You can't prevent it — and you shouldn't try. If a candidate lets the AI do everything and can't explain any of it under questioning, that IS the signal. The defense phase (15 minutes at the end where you ask them to explain every line) is where AI-over-reliance becomes immediately visible. There's no way to fake understanding of code you didn't write when someone asks "why did the model choose this approach?" and then "rewrite this section without AI."

What seniority level does this format work for?

All levels, with calibration. For L3/L4: check baseline verification discipline and prompt fundamentals. For L5+: evaluate orchestration judgment, architectural decisions made while directing agents, and the ability to scope and decompose large problems. Senior candidates should be able to run parallel agent tasks while reviewing the output of another — and explain every decision made.

Can we run this format without building it ourselves?

Yes — Altor conducts vibe coding and AI agent proficiency interviews on behalf of engineering teams. We run the session, review the transcript, score against the rubric, and deliver a hire/no-hire recommendation with written reasoning within 24 hours.

Altor Runs Vibe Coding Interviews For You

We conduct live AI agent proficiency interviews on behalf of US engineering teams. You get the scored report — we handle the format, rubric, and session review.

Book a Discovery Call Email amanda@altorlab.xyz