Jan Tempus and coauthors release paper 'Tokenisation via Convex Relaxations' that frames tokenization as optimization in 100-million-dimensional space solved via convex relaxations
Method yields consistent gains over BPE across language models.
——0——
QUOTE POST
#565Alex Nichol@UNIXPICKLE
I really enjoyed reading this paper. I paused after the graph framing but before the ILP formulation to derive it myself. Took >an hour, even knowing that it *could* be framed as an LP. Fun puzzle! I won't spoil it.
In our new paper, we reinterpret tokenisation as a problem in high-dimensional geometry (100M dims to be precise!), which we can solve efficiently to get a globally near-optimal tokeniser! Our method consistently improves language models over BPE. See 🧵for details.
12:51 PM · May 22, 2026 · 34.5K Views
6:07 AM · May 23, 2026 · 5K Views