/Tech8h ago

Developer @nathanrs adapts the standard gzip compression algorithm to perform language modeling and generate text

The demonstration produced outputs approximating the style of Shakespeare.

671.4K1221.1K100.2K

Original post unavailable.

Sentiment

Users praise the writeup on a Gzip tool generating Shakespeare-like text, calling the article interesting, cool, and an actual gem worth reading.

Pos

100.0%

Neg

0.0%

8 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS7.2KBOOKMARKS91LIKES115RETWEETS8REPLIES6

nathan (in sf)@nathanrs

Blog post: https://nathan.rs/posts/gzip-lm/

12h7.2K11591

Thomas Ahle@thomasahle

@nathanrs Ziplm: Gzip-Backed Language Model: https://news.ycombinator.com/item?id=36732430

9h1.8K285

heyskylark@heyskylark

@nathanrs kind of reminds me of 3blue1browns video on the relationships between entropy and compression

8h843204

RAJAT PALIWAL@rajatpaliwal319

@nathanrs You would be amazed to know the concept of information theory is based on compression = prediction. Check article: The Intricate Link Between Compression and Prediction How Gzip and K-Nearest Neighbors Can Outperform Deep Learning Models.

10h79973

Alex Weers@a_weers

@nathanrs It can also be used as text classifier: https://aclanthology.org/2023.findings-acl.426.pdf

10h64663

Rudi Cilibrasi@cilibrar

@nathanrs These papers deal with this and similar techniques. https://scholar.google.com/citations?user=upG1440AAAAJ&hl=en

8h78353

Chuck Petras@Chuck_Petras

@nathanrs A Few Million Monkeys Randomly Recreate Shakespeare

https://archive.ph/GxS7M

5h19123

Esa@esa_was_taken

@nathanrs what about https://en.wikipedia.org/wiki/Arithmetic_coding

8h1.2K72

Nosredna@Nosredna

@nathanrs You might be interested in this article from Byte magazine in 1984.

I implemented it back then then did a bunch of optimizations that are similar to what you're doing.

Title: A Travesty Generator for Micros

https://vintageapple.org/byte/pdf/198411_Byte_Magazine_Vol_09-12_New_Chips.pdf

6h16642

Alex@AlexanderMoini

@nathanrs back in the day there was a paper comparing using gzip in place of embedding models lol

8h6909

Cookiethief@Cookiethief19

@nathanrs We'd have ASI by now if you lot paid for that damn WinRAR license

7h24810

Noah Vandal@noah_vandal

@nathanrs awesome writeup, thanks! the gh repo may be private, i am not able to access it

10h3.4K6

AstroFella@UrbanAstroFella

@nathanrs I was marveling at the complexity of origami solving algorithms unpacking into highly dimensional coherent representations the other day. This makes perfect sense.

6h41421