The release uses FP4 quantization and DFlash speculative decoding.
@zephyr_z9 @fi56622380 Not the first useful one but certainly nice to see!
This is super big I think this is the first useful speculative decoding method deployed on a big quasi frontier model Massive unlock @fi56622380
This is super big I think this is the first useful speculative decoding method deployed on a big quasi frontier model Massive unlock @fi56622380
super cool
The release uses FP4 quantization and DFlash speculative decoding.