@teortaxesTex @FeltSteam they should not have given up until reaching 100T parameters, at least
human brain has 100T synapses and huge sparsity for a reason
@FeltSteam That was the guess but the question is, again, why didn't they bother? it was clearly broken. Token repetitions, pretty meh uplift for the cost They gave up on it
