/AI22h ago

Meta releases Muse Spark Contemplating safety report, showing the reasoning model trails Claude Opus 4.6 on cybersecurity benchmarks

The model is not yet deployed for general availability.

118081814.8K
Original postMiles Brundage#20
Nathaniel Li@natliml

We're releasing the preparedness report for Muse Spark Contemplating, MSL's extreme reasoning model, benchmarking its capabilities and behaviors in biology, cybersecurity, and more!

11:21 AM · Jun 5, 2026 · 10.9K Views
Sentiment

Users expressed excitement over the solid capability jump in Muse Spark from MSL's preparedness report, noting its strong reasoning performance against other major models.

Pos
100.0%
Neg
0.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS4KBOOKMARKS9LIKES31REPLIES1

In retrospect it was obvious how they'll fall off after Llama-Guard release. Meta is a very cowardly company after all, the whole doomer gnashing of teeth about their "irresponsibility" in peak Llama era was misguided. Below Opus 4.6, and still no general availability.

Nathaniel Li@natliml

We're releasing the preparedness report for Muse Spark Contemplating, MSL's extreme reasoning model, benchmarking its capabilities and behaviors in biology, cybersecurity, and more!

11hViews 4KLikes 31Bookmarks 9
Nathaniel Li@natliml

https://ai.meta.com/static-resource/muse-spark-contemplating-safety-and-preparedness-report/

22hViews 292Likes 2
Tim Kostolansky@thkostolansky

@natliml why not most recent opus or gpt evals

17hViews 194
Tim Kostolansky@thkostolansky

@natliml contemplating 🤣

17hViews 50Likes 1
Aaron Scher@aaronscher

@natliml Is the model more or less eval aware than Muse Spark normal?

17hViews 49Likes 1
Kit Dobyns@kitdobyns

pretty solid capability jump! how do you think about using static evals for a reasoning model? Almost every model from a major lab can crush point-in-time checks (ie wmdp). They typically fail later on (maybe turn 12) via continuous reasoning loops. I'd love to see results from dynamic trajectory testing.

21hViews 113
Clark@clark__labs

@natliml openrouter or it never happened!

16hViews 41