@MatthewBerman Try this. Don't blink.
https://chatjimmy.ai/
I wish LLMs ran 1000x faster.
Do you fully understand the economic unlock of that?
The rapid generation prompted questions about the underlying serving stack.
@MatthewBerman Try this. Don't blink.
https://chatjimmy.ai/
I wish LLMs ran 1000x faster.
Do you fully understand the economic unlock of that?
No Digg Deeper questions have been answered for this story yet.
@NickADobos wtf....how?
@MatthewBerman Try this. Don't blink.
https://chatjimmy.ai/

1. It’s a small model. Nowhere near the big leading ones.
2. They built custom chips and trained the model to work with it. They explain it somewhere on the website. I believe they are working on a v2 now with a bigger smarter model.
I believe other companies including OpenAI are working on similar projects with custom chips.

@MatthewBerman Also IIRC they actually burn it into the chip. So once you train the model and build the chips you can’t upgrade it or swap to a new model. You would need to build brand new chips.
So it’s a very different technique vs a chip that can run any type of inference

@MatthewBerman @NickADobos LLM burned into silicon.