Eric Jang tests transformer models in Go bots and finds they underperform ResNets because ResNets favor repeating local patterns while transformers emphasize global context
Keerthana Gopalakrishnan notes vision transformers already process local patches.
@dwarkesh_sp @ericjang11 This doesn’t make sense? Vision tokens are local patches?
.@ericjang11 tried using transformers for his Go bot, but they couldn't beat ResNets. The reason gets at something general about architectures. ResNets are biased towards the local. Nearby things matter more, and a useful pattern in one place is a useful pattern anywhere. Transformers are biased the other way, towards global context, with every position able to attend to every other. Most Go fighting is local, and a useful local pattern learned in one position can be applied anywhere in the board. A ResNet's inductive bias means it gets these insights about Go for free. But a transformer has to pay for them.