/AI22h ago

Meta releases Muse Spark Contemplating safety report, showing the reasoning model trails Claude Opus 4.6 on cybersecurity benchmarks

The model is not yet deployed for general availability.

118081814.8K

#20

Original post

Miles Brundage#20

Nathaniel Li@natliml

We're releasing the preparedness report for Muse Spark Contemplating, MSL's extreme reasoning model, benchmarking its capabilities and behaviors in biology, cybersecurity, and more!

11:21 AM · Jun 5, 2026 · 10.9K Views

Sentiment

Users expressed excitement over the solid capability jump in Muse Spark from MSL's preparedness report, noting its strong reasoning performance against other major models.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS4KBOOKMARKS9LIKES31REPLIES1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

In retrospect it was obvious how they'll fall off after Llama-Guard release. Meta is a very cowardly company after all, the whole doomer gnashing of teeth about their "irresponsibility" in peak Llama era was misguided. Below Opus 4.6, and still no general availability.

Nathaniel Li@natliml

We're releasing the preparedness report for Muse Spark Contemplating, MSL's extreme reasoning model, benchmarking its capabilities and behaviors in biology, cybersecurity, and more!

11h4K319

Nathaniel Li@natliml

https://ai.meta.com/static-resource/muse-spark-contemplating-safety-and-preparedness-report/

22h2922

Tim Kostolansky@thkostolansky

@natliml why not most recent opus or gpt evals

17h194

Tim Kostolansky@thkostolansky

@natliml contemplating 🤣

17h501

Aaron Scher@aaronscher

@natliml Is the model more or less eval aware than Muse Spark normal?

17h491

Kit Dobyns@kitdobyns

pretty solid capability jump! how do you think about using static evals for a reasoning model? Almost every model from a major lab can crush point-in-time checks (ie wmdp). They typically fail later on (maybe turn 12) via continuous reasoning loops. I'd love to see results from dynamic trajectory testing.

21h113

Clark@clark__labs

@natliml openrouter or it never happened!

16h41

im here so i wont get fined@plumberbutt97

@natliml You see this yet, @ml_angelopoulos?

16h17