/Tech1d ago

FNS Tokenizer Delivers Efficient Chunking With Character-Level Resolution

84411303.9K
Original postLewis Tunstall#1040
Leandro von Werra@lvwerra#1713inTech

Deep dive into FNS: building a tokenizer that chunks text efficiently but has character level resolution!

FNS augments the loss with character level signal at training time while at inference time you can decode single characters.

Deep dive here: https://huggingface.co/spaces/HuggingFaceBio/carbon-tokenization

3:16 AM · Jun 9, 2026 · 3.9K Views
Sentiment

Users are praising the FNS Tokenizer as a super clever approach for efficient DNA chunking with character-level resolution.

Pos
100.0%
Neg
0.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS22

@ChainZenit thanks! @QiuyiLi2‘s work!

1dViews 22Likes 1
LIKES1
Qiuyi Li@QiuyiLi2

@lvwerra nono, can't do without you 😊

1dViews 4Likes 1
REPLIES1
Aeron@aeronxbt

@lvwerra character level resolution is nice but id wanna see memory impact vs subword tokenizers

1dViews 3
Strata@ChainZenit

@lvwerra that’s actually a super clever way to handle tokenization.

1dViews 14Likes 1
Rugbist@rugbist_

@lvwerra wait so it has character level precision but still chunks efficiently?

is this the tokenizer that finally kills the subword tradeoffs?

1dViews 21
Qiuyi Li@QiuyiLi2

@lvwerra @danaaubakir 🔥🔥🔥

1dViews 10Likes 1

@rugbist_ it depends a lot on your application. this works well on DNA where you only have 4 characters, so you can easily create all combinations of tokens to a certain lenght. natural language is a bit trickier.

1dViews 20

@aeronxbt The tokenizer does split into chunks of 6 characters, so you get all the advantages of the tokenizer.

1dViews 5