@_albertgu 😭
Transformers are better at copying, while RNNs are better at modeling "meaning-bearing words—the nouns, verbs, & adjectives that say what a sentence is about"
The study compared Olmo 3 against the Olmo Hybrid.
@_albertgu 😭
Transformers are better at copying, while RNNs are better at modeling "meaning-bearing words—the nouns, verbs, & adjectives that say what a sentence is about"
No Digg Deeper questions have been answered for this story yet.

@giffmana in retrospect i realized this post sounds hilariously biased which was not intentional, i was mostly quoting the original 😂
Transformers are better at copying, while RNNs are better at modeling "meaning-bearing words—the nouns, verbs, & adjectives that say what a sentence is about"
Hybrid (transformer–RNN) models are fast becoming a serious alternative to the transformer, but a big question remains: how do they process tokens differently & how does this impact performance?
We compared our transformer (Olmo 3) & hybrid (Olmo Hybrid) models to find out. 🧵

@_albertgu Interesting