Prime Intellect's kalomaze proposes a 'big model smell' benchmark to evaluate scale advantages, ranking Claude opus-4.6 first · Digg