1d ago

Hippocratic AI Uses Modular MAX For Real-Time Patient Conversations

16010129.0K

——0——

Original post

AI agents in healthcare face tight constraints: latency can't exceed 800ms per turn, the first turn processes 10k tokens of context, and safety models analyze the conversation in parallel. Using our MAX framework, @hippocraticai keeps patient conversations instant (sub-second TTFT), hits aggressive performance targets without sacrificing model accuracy, and runs across accelerators as new hardware comes to market. A look at how regulated enterprises like Hippocratic AI use MAX in production for real-time patient conversations: https://www.modular.com/blog/hippocratic-ai-partners-with-modular-to-power-flexible-high-quality-inference-for-real-time-patient-conversations?utm_source=x&utm_campaign=hippocraticai

11:11 AM · May 18, 2026

POST

#769Chris Lattner@CLATTNER_LLVM

Amazing to see what the @Hippocratic AI team is achieving with MAX. Their Polaris agent runs patient care conversations and needs to complete every turn in under 800ms, with safety models analyzing in parallel. 👇

4:22 PM · May 19, 2026 · 3.2K Views

#769Chris Lattner@CLATTNER_LLVM

At many thousands of sessions/day, getting sub-second TTFT without sacrificing accuracy is hard. MAX delivers where others cannot, and paves the way to swap to other accelerators without added complexity. Check out how they did it: https://www.modular.com/blog/hippocratic-ai-partners-with-modular-to-power-flexible-high-quality-inference-for-real-time-patient-conversations?utm_source=linkedin_chris&utm_campaign=hippocratic

Chris Lattner@clattner_llvm

4:22 PM · May 19, 2026 · 3.2K Views

4:23 PM · May 19, 2026 · 1.6K Views

Hippocratic AI Uses Modular MAX For Real-Time Patient Conversations

Sentiment

Cluster engagement