/AI6h ago

LisanBench Maintainer Updates GitHub Repo For Research Paper

--0--
Quote posts
Comments
Original post
Lisan al Gaib@scaling01#980inAI

the reason for the big update is that the old code was outdated and that I needed a base for my LisanBench paper

Lisan al Gaib@scaling01

btw I updated LisanBench repo yesterday so it's actually using 50 starting words instead of just 10

I haven't tested all models with this new code, so some might still need an adjustment in the model catalog. but the important bit is that the scoring and the starting words work and is the same as on the website.

oh and sometimes you will have to set the max completion tokens manually, because some providers don't actually go up to 100k

https://github.com/voice-from-the-outer-world/lisan-bench

7:36 AM · Jun 1, 2026 · 1.3K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS2.1KBOOKMARKS5LIKES14REPLIES1
Lisan al Gaib@scaling01

I was also doing some testing to find a smaller set of words to approximate the benchmark score for less money, because ideally I want to have datapoints for all models and reasoning efforts

I optimized for a set that uses 5x fewer trials, but of course this blows up CIs too much, so that's not going to cut it

(also not sure if minimal actually exists for GPT-5.4-mini or if it was just defaulting to low, the docs are confusing)

Lisan al Gaib@scaling01

btw I updated LisanBench repo yesterday so it's actually using 50 starting words instead of just 10

I haven't tested all models with this new code, so some might still need an adjustment in the model catalog. but the important bit is that the scoring and the starting words work and is the same as on the website.

oh and sometimes you will have to set the max completion tokens manually, because some providers don't actually go up to 100k

https://github.com/voice-from-the-outer-world/lisan-bench

6hViews 2.1KLikes 14Bookmarks 5