/Tech2h ago

IBM and MIT CSAIL researcher Leshem Choshen says word-wise translation mappings on joint bilingual models enable cross-lingual sharing without additional training

The methodology will anchor Choshen's new research laboratory.

200045
Original post

@yoavartzi I am soon moving to the new lab, so there's a lot of thinking (and a big branch should be pretraining), but it is also already active. I think the most concrete pretraining challenge is With context

"It is time to separate language from language models" The revelation keeps bugging me, and while making the talk "multilingual?" I just gave. Thought I'd briefly share the contents of the talk:

1:15 PM Β· Jun 11, 2026 Β· 24 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS11

@yoavartzi There's also the thought of what tricks are people doing post hoc, what if we pretrain? e.g. speculative decoding is ~ deepseek's next token prediction trick. Thinking and instructions there are paper that show it. What else do we do only at the end?

@yoavartzi I am soon moving to the new lab, so there's a lot of thinking (and a big branch should be pretraining), but it is also already active. I think the most concrete pretraining challenge is With context

2hViews 11Likes 0Bookmarks 0
Yoav Artzi@yoavartzi

@LChoshen This is cool. I wasn't aware of this line of work. Results look strong. Thanks!

@yoavartzi I am soon moving to the new lab, so there's a lot of thinking (and a big branch should be pretraining), but it is also already active. I think the most concrete pretraining challenge is With context

1hViews 10Likes 0Bookmarks 0