Allen AI's William Merrill finds hybrid transformer-RNN architectures outperform pure transformers on token-level content words · Digg