Cambridge researcher Herbie Bradley argues scaling has failed to smooth out uneven LLM capabilities across verifiable and non-verifiable tasks · Digg