We need an American lab that can make open weights models that are this good
GLM-5.2 just scored better than Opus 4.7 and GPT 5.4 on Runescape bench.
These models were best in class only 2-3 months ago.
Open source frontier is catching up!?
The surpassed proprietary models led rankings two months ago
We need an American lab that can make open weights models that are this good
GLM-5.2 just scored better than Opus 4.7 and GPT 5.4 on Runescape bench.
These models were best in class only 2-3 months ago.
Open source frontier is catching up!?
Many users congratulated the open-source GLM-5.2 release for topping Opus and GPT on the Runescape benchmark while others dismissed that benchmark as irrelevant to real-world use.
No Digg Deeper questions have been answered for this story yet.

congrats @Zai_org on the release!

@thaonlyjonathan $150 at api price- Fable was cheaper than GPT-5.5 xhigh.
And for reference, GLM-5.2 was $32
GLM-5.2 just scored better than Opus 4.7 and GPT 5.4 on Runescape bench.
These models were best in class only 2-3 months ago.
Open source frontier is catching up!?

@ChickenSamosaa @mattparlmer the reason they do it is simply cuz they're not the best. and it helps gain adoption without compromising their standing

@maxbittker The rule is clear: if Gemini appears on a benchmark with a high score, the benchmark is useless.

@maxbittker ah runescape, the only benchmark that matters.

@maxbittker The open source renaissance is beginning
At a fraction of the cost
Love this benchmark

@mattparlmer Impossible I'm afraid. American tech culture forbids it.
Americans have the mentality that they should be billionaires if they can make something good, and so they'll never make good models free open weights.
The US will never have an open weights ecosystem the way China does.

@mattparlmer 5.2 is surprisingly good at reverse engineering btw

@mattparlmer @DanielleFong Arcee?

@angeris Reverse engineering what?

@theAlexQuach Natural log of peak xp per minute

@smplrandom @DanielleFong They have yet to drop a model anywhere near this high scoring on benchmarks

@maxbittker scores should be out of 120 or 200M imo

@maxbittker RuneScape bench?!?! Why is this the first I’m hearing of this amazing bench lol

@mattparlmer Apple should do it/ fund it (bc brand safety) They pay Google $1 billion a year, a pittance to running a real AI lab and a bet LLMs get commodified. Timing would be right, after the lab IPOs, if a dozen or so really good engineers decide they have enough money/ take a low salary

@SCH_Clay @mattparlmer software (for now!)
i don’t like stuff connecting to the internet at all times

@maxbittker RuneScape bench? Tell me more.

@maxbittker Shameless plug:

@maxbittker woah!