1d ago

DeepMind Unveils Multimodal AI System For Real-Time Medical Video Consultations

0
Original post

(1/10) Excited to share our new paper: "Towards Conversational Medical AI with Eyes, Ears and a Voice" https://arxiv.org/abs/2605.09272 Clinical medicine depends on more than words; it relies on real-time visual cues (rashes, distress, gait), auditory signals (prosody, breathing), and guided physical exams (e.g., guiding someone through correcting their inhaler technique, or walking them through shoulder maneuvers to work up a rotator cuff injury). But existing conversational medical AI systems are blind and deaf to this crucial information and are constrained to rigid, turn-by-turn user experience. As an important ingredient in AI co-clinician, our new healthcare initiative at @GoogleDeepMind, we developed a first-of-its-kind real-time multimodal system that conducts natural clinical consultations over video calls while continuously watching, listening, and reasoning. The preprint details the research progress we’ve made so far together with our collaborators at @StanfordMed and @harvardmed (@AdamRodmanMD @euanashley @DrJackOSullivan @jasongusdorf). More details below!

9:58 PM · May 17, 2026 View on X
Reposted by