/Tech2h ago

IBM and MIT CSAIL researcher Leshem Choshen says word-wise translation mappings on joint bilingual models enable cross-lingual sharing without additional training

The methodology will anchor Choshen's new research laboratory.

200045

#169

Original post

Leshem (Legend) Choshen 🤖🤗@LChoshen#984inTech

@yoavartzi I am soon moving to the new lab, so there's a lot of thinking (and a big branch should be pretraining), but it is also already active. I think the most concrete pretraining challenge is With context

Leshem (Legend) Choshen 🤖🤗@LChoshen

"It is time to separate language from language models" The revelation keeps bugging me, and while making the talk "multilingual?" I just gave. Thought I'd briefly share the contents of the talk:

1:15 PM · Jun 11, 2026 · 24 Views

/Tech2h ago

IBM and MIT CSAIL researcher Leshem Choshen says word-wise translation mappings on joint bilingual models enable cross-lingual sharing without additional training

The methodology will anchor Choshen's new research laboratory.

200045

#169

Original post

Leshem (Legend) Choshen 🤖🤗@LChoshen#984inTech

Leshem (Legend) Choshen 🤖🤗@LChoshen

"It is time to separate language from language models" The revelation keeps bugging me, and while making the talk "multilingual?" I just gave. Thought I'd briefly share the contents of the talk:

1:15 PM · Jun 11, 2026 · 24 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

Leshem (Legend) Choshen 🤖🤗@LChoshen

@yoavartzi There's also the thought of what tricks are people doing post hoc, what if we pretrain? e.g. speculative decoding is ~ deepseek's next token prediction trick. Thinking and instructions there are paper that show it. What else do we do only at the end?

Leshem (Legend) Choshen 🤖🤗@LChoshen

2h1100

Yoav Artzi@yoavartzi

@LChoshen This is cool. I wasn't aware of this line of work. Results look strong. Thanks!

Leshem (Legend) Choshen 🤖🤗@LChoshen

1h1000