The overall ranking. Congratulations to @ekoermann @krithikvish and their team @nyulangone for getting this done. We need more of these rigorous assessments.
Here is the performance breakdown for each model's blinded assessment for 4 major tasks: (1) clinical correctness, (2) completeness, (3) safety, and (4) clarity.









