
Blog post: https://nathan.rs/posts/gzip-lm/
The demonstration produced outputs approximating the style of Shakespeare.
Users praise the writeup on a Gzip tool generating Shakespeare-like text, calling the article interesting, cool, and an actual gem worth reading.
No Digg Deeper questions have been answered for this story yet.

Blog post: https://nathan.rs/posts/gzip-lm/

@nathanrs Ziplm: Gzip-Backed Language Model: https://news.ycombinator.com/item?id=36732430

@nathanrs kind of reminds me of 3blue1browns video on the relationships between entropy and compression

@nathanrs You would be amazed to know the concept of information theory is based on compression = prediction. Check article: The Intricate Link Between Compression and Prediction How Gzip and K-Nearest Neighbors Can Outperform Deep Learning Models.

@nathanrs It can also be used as text classifier: https://aclanthology.org/2023.findings-acl.426.pdf

@nathanrs These papers deal with this and similar techniques. https://scholar.google.com/citations?user=upG1440AAAAJ&hl=en

@nathanrs A Few Million Monkeys Randomly Recreate Shakespeare
https://archive.ph/GxS7M

@nathanrs what about https://en.wikipedia.org/wiki/Arithmetic_coding

@nathanrs You might be interested in this article from Byte magazine in 1984.
I implemented it back then then did a bunch of optimizations that are similar to what you're doing.
Title: A Travesty Generator for Micros
https://vintageapple.org/byte/pdf/198411_Byte_Magazine_Vol_09-12_New_Chips.pdf

@nathanrs back in the day there was a paper comparing using gzip in place of embedding models lol

@nathanrs We'd have ASI by now if you lot paid for that damn WinRAR license

@nathanrs awesome writeup, thanks! the gh repo may be private, i am not able to access it

@nathanrs I was marveling at the complexity of origami solving algorithms unpacking into highly dimensional coherent representations the other day. This makes perfect sense.

@nathanrs Is this the same thing -

@noah_vandal Whoops you’re right 😅 should be fixed now

@nathanrs Real productive night

@nathanrs Wow really interesting use of beam search. I was just playing with a SNES Mario AI using this same method, had no clue it could go that far
https://github.com/patnir411/mario_ai

@nathanrs Any data compressor can be converted into a generator by sampling from random extensions of the compressed file.

@krishmatta Did everything but train a model for work 😭

@nathanrs compression and prediction is the same thing