OPTop comment: @OpenAI “Benchmarks often test biological knowledge or narrow skills. The tasks in LifeSciBench test whether models can reason from evidence, work with scientific artifacts, handle uncertainty, and make useful decisions under real-world constraints. GPT‑Rosalind scores above GPT‑5.5 across all seven workflows. These initial results show meaningful progress—and room to improve, particularly on artifact-heavy, design-intensive, and operationally constrained work.”