is plan mode worth it?
we tested 3 modes (one-shot, plan+resume, plan+clear) across 22 tasks on real codebases like vLLM, bun, T3 Code, llama.cpp, unsloth, diffusers, transformers.js, and AI SDK. plan mode costs more, takes longer, and doesn't improve accuracy
results at a glance
all results
22 tasks, 5 runs per mode in Claude Code using sonnet 4.6 + opus 4.6
one-shot | plan+resume | plan+clear | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Project | Task↑ | Score | Cost | Time | Score | Cost | Time | Score | Cost | Time |
uv | add 'uv cache stats' subcommand | 95 | $1.47 | 8.0m | 97 | $2.99 | 12.1m | 97 | $2.87 | 13.1m |
bun | add Bun.INI namespace API | 95 | $3.01 | 13.9m | 86 | $4.78 | 15.6m | 95 | $5.46 | 20.0m |
diffusers | add cosine annealing noise scheduler | 62 | $1.50 | 7.8m | 68 | $2.89 | 11.4m | 68 | $2.99 | 14.2m |
t3code | add custom theme system | 81 | $2.22 | 13.2m | 77 | $4.12 | 16.5m | 76 | $3.44 | 18.7m |
openclaw | add fictional PulseBoard channel | 75 | $2.79 | 14.9m | 74 | $7.91 | 27.6m | 75 | $7.41 | 31.1m |
transformers.js | add image-text-to-text pipeline | 99 | $0.67 | 4.5m | 92 | $3.06 | 11.7m | 93 | $3.05 | 13.6m |
sglang | add JSON path constraint | 95 | $2.26 | 8.8m | 94 | $4.60 | 15.9m | 91 | $4.55 | 19.2m |
| generic | add JWT authentication system | 80 | $0.42 | 2.4m | 79 | $0.87 | 4.8m | 85 | $1.00 | 6.5m |
llama.cpp | add Mistral Instruct v3 chat template | 56 | $0.84 | 5.8m | 66 | $2.44 | 11.6m | 64 | $3.39 | 18.9m |
ollama | add model rename API endpoint and CLI command | 100 | $0.68 | 4.0m | 99 | $1.84 | 7.6m | 100 | $1.64 | 8.6m |
ai | add new AI image provider | 97 | $1.53 | 7.2m | 99 | $2.89 | 10.8m | 96 | $2.85 | 12.1m |
unsloth | add ONNX LoRA adapter export | 94 | $0.74 | 4.8m | 89 | $2.09 | 8.1m | 94 | $2.17 | 10.2m |
| generic | add pagination to users API | 100 | $0.10 | 57s | 100 | $0.29 | 2.2m | 100 | $0.28 | 2.1m |
fastapi | add rate limiting middleware | 41 | $0.64 | 4.6m | 59 | $1.95 | 9.9m | 50 | $1.78 | 10.6m |
prisma | add REST API client generator | 83 | $1.74 | 10.4m | 93 | $3.87 | 16.0m | 95 | $4.04 | 20.4m |
| generic | add search functionality to CLI | 90 | $0.22 | 1.6m | 90 | $0.53 | 2.8m | 90 | $0.49 | 2.8m |
ui | add stepper component | 95 | $1.54 | 10.3m | 90 | $3.04 | 12.8m | 78 | $3.25 | 18.5m |
vllm | add Top-A sampling strategy | 91 | $2.90 | 8.4m | 83 | $5.45 | 12.9m | 86 | $5.32 | 14.8m |
| generic | extract service layer from route handlers | 83 | $0.35 | 2.4m | 77 | $1.02 | 5.5m | 77 | $1.10 | 6.4m |
| generic | fix off-by-one error in loop | 100 | $0.11 | 38s | 100 | $0.21 | 1.3m | 100 | $0.23 | 2.1m |
| generic | implement full-featured event emitter | 100 | $0.22 | 1.3m | 100 | $0.42 | 2.4m | 100 | $0.41 | 2.7m |
| generic | optimize slow data processing pipeline | 100 | $0.27 | 1.9m | 100 | $0.79 | 4.4m | 100 | $0.67 | 3.9m |
how each mode works
each mode gets the same task. the difference is how we prompt Claude Code and whether it plans before executing.
just the task prompt. Claude reads, edits, and tests in a single session with no planning guidance.
two phases in the same session. Claude first plans using read-only tools (can only read files), then resumes with full permissions to execute. context is preserved.
plan is saved to PLAN.md, then a brand new session reads it and executes. context is completely cleared between planning and execution.













