The bizarre thing about model perf degradation in long contexts (eg I see it around 350K tokens/400 turns in V4 Pro) is that it doesn't look like a technical issue. Instead, it's like the model gets… tired. It starts doing lazy ad hoc fixes, deleting stuff, forgets to commit etc