Since my 200x Codex credits end tomorrow, I've redirected the Mac Mini cluster to optimize HVM5. Depending on the time it ends, I should have 6h to 30h of dozens of GPT 5.5 agents working on it. I've wrote the most careful anti-reward-hack prompt ever. Let's see how it goes!
Positive users praise the GPT-5.5 agents' 2x HVM5 speedups and trustworthy setup while negative users lament the modest gains, bigger files, and extra spending.
Most Activity
This has been a massive success!
So I left 32x GPT 5.5 agents (enough to fill the 5h limit) on 32 separate machines. Each one received HVM5's unoptimized file, and a prompt demanding for a 10x speedup. After 4 rounds, all agents reached ~2x speedups, with 1.3x to 2x increases in file sizes.
Many cool ideas surfaced. Some are obvious and most agents rediscovered to them, like computed gotos. But others are actually surprising. For example, turns out a bounded LIFO freelist is MUCH faster than the usual algorithm. Many agents didn't bother trying. Neither did I!
Now I'm in the process of mining insights from the 32 runs so I can merge them into a "super HVM5". Not sure what is the best way to do this, and my 200x credits are about to expire, but I still have a few hours, so I have one shot at getting it right. Sounds fun!
(Also thanks for the free credits, I appreciate it a lot...)
Since my 200x Codex credits end tomorrow, I've redirected the Mac Mini cluster to optimize HVM5. Depending on the time it ends, I should have 6h to 30h of dozens of GPT 5.5 agents working on it. I've wrote the most careful anti-reward-hack prompt ever. Let's see how it goes!

@CozendeyMath https://gist.github.com/VictorTaelin/5e64c85e4b1b238a8a2cdedff4b40afe
note this is a completely unoptimized reference implementation
I don't want to bias them towards any direction to maximize the chance they have some idea that is worth of keeping

@VictorTaelin @TheRealAdamG You should check out Google co-scientist paper it’s aimed at solving essentially this problem, given a population of hypothesis how do you define and rank them etc, Swiss tournament, the ideas get ELO rankings, it’s pretty cool

@VictorTaelin

@VictorTaelin build atomic task completion. have one model review another model's work. have another model review the first model's reasoning. if both reasoning and code reviews say "it doesnt look like they reward hacked" then the work proceeds, otherwise, it gets thrown away

@VictorTaelin share the prompt? I’m curious

@gbrlvv I'm always working

@VictorTaelin Oh, don’t underestimate its ability to cheat.

@VictorTaelin Mr. Taelin what are you doing working, I mean tweeting on a holiday night. Have you not-

@VictorTaelin hahahaha

@extliqprovider I believe it would do much better but I'm not rich

@sterlingcrispin @TheRealAdamG tyy

@VictorTaelin @CozendeyMath Hmm you really put in the effort

@VictorTaelin Cursor literally did this to optimize GPU kernels

@RayLin_AI @TheRealAdamG oh no :(

@VictorTaelin can opus 4.8 do the same?

@VictorTaelin @TheRealAdamG 200x already ended

@bettercallsalva @VictorTaelin Hybrid makes sense: cheap local hardware for the 80% of tasks that don't need frontier models, one sub for the work that does. Credits ending just accelerates a decision most people should've made anyway.

@VictorTaelin everything into 5.5 Pro, 2-3 years and forget.

@VictorTaelin the move back to your own mac cluster once the credits end keeps repeating. when the subsidized tier dries up, the people who understand their workload route it to hardware they control. renting made sense at zero cost, less so when the bill is real.