/AI6h ago

LisanBench Maintainer Updates GitHub Repo For Research Paper

123263.5K

Quote posts

#980

Comments

#980

Original post

Lisan al Gaib@scaling01#980inAI

the reason for the big update is that the old code was outdated and that I needed a base for my LisanBench paper

Lisan al Gaib@scaling01

btw I updated LisanBench repo yesterday so it's actually using 50 starting words instead of just 10

I haven't tested all models with this new code, so some might still need an adjustment in the model catalog. but the important bit is that the scoring and the starting words work and is the same as on the website.

oh and sometimes you will have to set the max completion tokens manually, because some providers don't actually go up to 100k

https://github.com/voice-from-the-outer-world/lisan-bench

7:36 AM · Jun 1, 2026 · 1.3K Views

/AI6h ago

LisanBench Maintainer Updates GitHub Repo For Research Paper

--0--

Quote posts

#980

Comments

#980

Original post

Lisan al Gaib@scaling01#980inAI

the reason for the big update is that the old code was outdated and that I needed a base for my LisanBench paper

Lisan al Gaib@scaling01

btw I updated LisanBench repo yesterday so it's actually using 50 starting words instead of just 10

oh and sometimes you will have to set the max completion tokens manually, because some providers don't actually go up to 100k

https://github.com/voice-from-the-outer-world/lisan-bench

7:36 AM · Jun 1, 2026 · 1.3K Views

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS2.1KBOOKMARKS5LIKES14REPLIES1

Lisan al Gaib@scaling01

I was also doing some testing to find a smaller set of words to approximate the benchmark score for less money, because ideally I want to have datapoints for all models and reasoning efforts

I optimized for a set that uses 5x fewer trials, but of course this blows up CIs too much, so that's not going to cut it

(also not sure if minimal actually exists for GPT-5.4-mini or if it was just defaulting to low, the docs are confusing)

Lisan al Gaib@scaling01

btw I updated LisanBench repo yesterday so it's actually using 50 starting words instead of just 10

oh and sometimes you will have to set the max completion tokens manually, because some providers don't actually go up to 100k

https://github.com/voice-from-the-outer-world/lisan-bench

6h2.1K145