nevermind, that Chinese "fingerprint" eval is utter trash lmao
@teortaxesTex I think this just measures RL intensity. Smaller model RL more because it's easier
nevermind, that Chinese "fingerprint" eval is utter trash lmao
@teortaxesTex I think this just measures RL intensity. Smaller model RL more because it's easier
Many users criticized the Chinese AI fingerprint evaluation mechanism as misleading to beginners and inflated with excessive exaggeration in its standards.
No Digg Deeper questions have been answered for this story yet.
LLMs are easy to impress, but as easy to disillusion
nevermind, that Chinese "fingerprint" eval is utter trash lmao

@teortaxesTex 这种评估机制确实纯属误导新手

@teortaxesTex It would be interesting to run this but with reasoning on max, schema-verified output instead of capped tokens, and same random number prompt of something like 36 numbers, and see how random the numbers are compared to how "platitudinal" they are (420,69,3.14, etc)

@teortaxesTex 这套评估标准确实水分太大了