3h ago

Medical Fine-Tuning Boosts LLM Performance Across Models on Medmarks Benchmark

0
Original post

One of the interesting results from our recent Medmarks medical LLM benchmarking release. Medical domain-specific fine-tuning provides a significant boost in performance on our benchmark suite. However, frontier models like GPT 5.2 remain at the top of the leaderboard.

3:29 PM · May 19, 2026 View on X

Learn more about Medmarks:

SophontSophont@SophontAI

We're excited to release Medmarks v1.0 + a technical report! This is an update to our Medmarks benchmark suite, the largest open-source automated suite for evaluating the medical capabilities of LLMs. We added 10 benchmarks (20→30) and 15 models (46→61) to the leaderboard!

6:39 PM · May 12, 2026 · 21.3K Views
10:29 PM · May 19, 2026 · 616 Views