@sama this would be an absolute game changer if you can actually pull this. i dunno how it’ll scale infra wise, sounds hard.
but would instantly change behaviors for speed.
oh and also...750 token/sec coming to 5.6 sol in july!
The performance update will run on 5.6 sol hardware
@sama this would be an absolute game changer if you can actually pull this. i dunno how it’ll scale infra wise, sounds hard.
but would instantly change behaviors for speed.
oh and also...750 token/sec coming to 5.6 sol in july!
Positive users are excited about OpenAI's teased 750 tokens per second inference speed reducing latency, while negative users object to limited access and view the announcements as unhelpful teasing.
No Digg Deeper questions have been answered for this story yet.
experiencing a frontier model running at 750 tokens/sec gives you the same sense of wonder as seeing AI for the first time. excited to ship this soon!
oh and also...750 token/sec coming to 5.6 sol in july!

@stevenheidel I wouldn’t know what that’s like. 🙄 what’s up with OAI employees dangling these carrots in front of their large user base that has no access? It’s frustrating!

@stevenheidel if people can't use it, this conversation is useless

@signulll @sama 750MW provisionned specially for OpenAI according to Cerebras, I do not know if it is enough ? I have no idea

@stevenheidel very exciting to the trusted partners indeed

@stevenheidel /goal will be insane on 5.6 sol ultra lol

@stevenheidel Only for the elite

@stevenheidel How much more expensive though? 🫣

@stevenheidel Good I’m tired of latency I’m wasting all my time watching LLMs think

@stevenheidel no teasing pls. knowing there are new models but not having access sucks so bad...

@stevenheidel You bet...

@stevenheidel any demo to understand the difference?

@stevenheidel

@stevenheidel that is an insane amount of speed, can't wait to see it.

@stevenheidel I’d only need to run like 2-3 threads at a time which will be so much better.