One of the interesting results from our recent Medmarks medical LLM benchmarking release.
Medical domain-specific fine-tuning provides a significant boost in performance on our benchmark suite.
However, frontier models like GPT 5.2 remain at the top of the leaderboard.