the reason for the big update is that the old code was outdated and that I needed a base for my LisanBench paper
btw I updated LisanBench repo yesterday so it's actually using 50 starting words instead of just 10
I haven't tested all models with this new code, so some might still need an adjustment in the model catalog. but the important bit is that the scoring and the starting words work and is the same as on the website.
oh and sometimes you will have to set the max completion tokens manually, because some providers don't actually go up to 100k
https://github.com/voice-from-the-outer-world/lisan-bench