Model overthinking is a control problem

Most people think more reasoning tokens means more control. In practice, I have ran into the opposite direction sometimes.

Give a model a large thinking budget and the reasoning block can become unhinged. You watch it branch into side quests, framings, unnecessary planning loops, and speculative checks you never asked for. This can happen even after you give strict instructions. The model is not refusing your command. It is expanding the problem until control starts slipping.

That framing matters. The issue is not that more thinking is always bad. The issue is that more thinking sometimes increases variance. When I want maximum control, I often reach for low thinking instead. Shorter reasoning can force the model to stay closer to the rails. It has less room to invent a broader mission for itself.

This becomes a real design trade-off when you are building autonomous agents like Hermes, OpenClaw, or Pi-style systems. In those systems, the model is not just answering. It is acting. If the reasoning starts drifting, the action layer can drift with it. One bad chain of thought can turn into a rogue tool call, a pointless detour, or a messy sequence of steps that still looks justified.

I also suspect this gets worse when you do not maintain context discipline. If you keep reusing the same session for unrelated tasks, the model starts carrying stale intent forward. Old goals, old assumptions, and old tool patterns stay active longer than they should. Then you add a high thinking budget on top of that, and now the model has too much room to think with stale context.

This is also why the problem can stay invisible longer than it should. If you are only looking at final answers, you may miss the internal drift. But once the model has tool access, that hidden drift becomes operational. The mistake is no longer theoretical. It touches files, commands, and state.

You probably notice this less in Claude Code or Codex. By default, they usually ask permission before tool calls, and they do not expose the reasoning block in the same way. So the first time you notice the model heading toward a bad action, you still have one intervention point. You can simply say no.

That is why I do not treat higher thinking as an automatic upgrade anymore. Sometimes the right move is less thought, cleaner context, and tighter control.

Maybe Marc Andreessen was right about retardmaxxing.