/AI5h ago

FNS Tokenizer Delivers Efficient Chunking With Character-Level Resolution

83411192.4K
Original postLewis Tunstall#958
Leandro von Werra@lvwerra#1553inAI

Deep dive into FNS: building a tokenizer that chunks text efficiently but has character level resolution!

FNS augments the loss with character level signal at training time while at inference time you can decode single characters.

Deep dive here: https://huggingface.co/spaces/HuggingFaceBio/carbon-tokenization

3:16 AM · Jun 9, 2026 · 2.4K Views
Sentiment

Users are praising the FNS Tokenizer as a super clever approach for efficient DNA chunking with character-level resolution.

Pos
100.0%
Neg
0.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS22

@ChainZenit thanks! @QiuyiLi2‘s work!

5hViews 22Likes 1
LIKES1
Qiuyi Li@QiuyiLi2

@lvwerra nono, can't do without you 😊

4hViews 4Likes 1
REPLIES1
Aeron@aeronxbt

@lvwerra character level resolution is nice but id wanna see memory impact vs subword tokenizers

3hViews 3
Strata@ChainZenit

@lvwerra that’s actually a super clever way to handle tokenization.

5hViews 14Likes 1
Rugbist@rugbist_

@lvwerra wait so it has character level precision but still chunks efficiently?

is this the tokenizer that finally kills the subword tradeoffs?

5hViews 21
Qiuyi Li@QiuyiLi2

@lvwerra @danaaubakir 🔥🔥🔥

5hViews 10Likes 1

@rugbist_ it depends a lot on your application. this works well on DNA where you only have 4 characters, so you can easily create all combinations of tokens to a certain lenght. natural language is a bit trickier.

5hViews 20

@aeronxbt The tokenizer does split into chunks of 6 characters, so you get all the advantages of the tokenizer.

3hViews 5