Modal Servers deliver 6x faster responses than classic Modal Web Functions. We've used them to support world-wide inference services at world-class latency.
Excited to finally share how they work -- not least because I personally learned a lot about networking from this project!
Our new Auto Endpoints feature is powered by a new Modal primitive: Modal Servers.
In this blogpost, we walk through design principles and detailed architecture: @EnvoyProxy, @googlecloud Spanner config store, and a @Cloudflare Pingora-based custom proxy.

