Analysis finds hybrid transformer-RNN models outperform standard transformers on semantic words but trail on token copying · Digg