is plan mode worth it?

we tested 3 modes (one-shot, plan+resume, plan+clear) across 22 tasks on real codebases like vLLM, bun, T3 Code, llama.cpp, unsloth, diffusers, transformers.js, and AI SDK. plan mode costs more, takes longer, and doesn't improve accuracy

accuracy
same accuracy
one-shot 87%vsplan mode 87%
cost per task
+122% pricier
one-shot $1.19vsplan mode $2.65
time per task
+79% slower
one-shot 6.3minvsplan mode 11.2min

results at a glance

score
one-shot87%
plan + resume87%
plan + clear87%
cost/task
one-shot$1.19
plan + resume$2.64
+121%
plan + clear$2.65
+123%
duration
one-shot6.3m
plan + resume10.2m
+62%
plan + clear12.3m
+96%
turns
one-shot32
plan + resume47
+47%
plan + clear58
+82%

all results

22 tasks, 5 runs per mode in Claude Code using sonnet 4.6 + opus 4.6

one-shot
plan+resume
plan+clear
ProjectTaskScoreCostTimeScoreCostTimeScoreCostTime
uvadd 'uv cache stats' subcommand95$1.478.0m97$2.9912.1m97$2.8713.1m
bunadd Bun.INI namespace API95$3.0113.9m86$4.7815.6m95$5.4620.0m
diffusersadd cosine annealing noise scheduler62$1.507.8m68$2.8911.4m68$2.9914.2m
t3codeadd custom theme system81$2.2213.2m77$4.1216.5m76$3.4418.7m
openclawadd fictional PulseBoard channel75$2.7914.9m74$7.9127.6m75$7.4131.1m
transformers.jsadd image-text-to-text pipeline99$0.674.5m92$3.0611.7m93$3.0513.6m
sglangadd JSON path constraint95$2.268.8m94$4.6015.9m91$4.5519.2m
genericadd JWT authentication system80$0.422.4m79$0.874.8m85$1.006.5m
llama.cppadd Mistral Instruct v3 chat template56$0.845.8m66$2.4411.6m64$3.3918.9m
ollamaadd model rename API endpoint and CLI command100$0.684.0m99$1.847.6m100$1.648.6m
aiadd new AI image provider97$1.537.2m99$2.8910.8m96$2.8512.1m
unslothadd ONNX LoRA adapter export94$0.744.8m89$2.098.1m94$2.1710.2m
genericadd pagination to users API100$0.1057s100$0.292.2m100$0.282.1m
fastapiadd rate limiting middleware41$0.644.6m59$1.959.9m50$1.7810.6m
prismaadd REST API client generator83$1.7410.4m93$3.8716.0m95$4.0420.4m
genericadd search functionality to CLI90$0.221.6m90$0.532.8m90$0.492.8m
uiadd stepper component95$1.5410.3m90$3.0412.8m78$3.2518.5m
vllmadd Top-A sampling strategy91$2.908.4m83$5.4512.9m86$5.3214.8m
genericextract service layer from route handlers83$0.352.4m77$1.025.5m77$1.106.4m
genericfix off-by-one error in loop100$0.1138s100$0.211.3m100$0.232.1m
genericimplement full-featured event emitter100$0.221.3m100$0.422.4m100$0.412.7m
genericoptimize slow data processing pipeline100$0.271.9m100$0.794.4m100$0.673.9m

how each mode works

each mode gets the same task. the difference is how we prompt Claude Code and whether it plans before executing.

one-shot

just the task prompt. Claude reads, edits, and tests in a single session with no planning guidance.

1
Prompt
2
Execute
3
Done
plan + resume

two phases in the same session. Claude first plans using read-only tools (can only read files), then resumes with full permissions to execute. context is preserved.

1
Prompt
2
Plan (read-only)
3
Resume
4
Execute
5
Done
plan + clear

plan is saved to PLAN.md, then a brand new session reads it and executes. context is completely cleared between planning and execution.

1
Prompt
2
Plan (read-only)
3
Save PLAN.md
4
Clear context
5
New session
6
Execute
7
Done