AI tools accelerating security vulnerability discovery shift the primary constraint to human review, triage, and remediation, sustaining demand for security engineers

QUOTE POST

This demands expertise that very few people have, and it's only going to get worse across every known domain.

Aaron Levie@levie

Here’s a key line in this mythos update. This is precisely an example of why engineers don’t go away, ever. We’ve made it far easier to create and find security issues, which means the new bottleneck is our ability to actually review, respond to, and fix the issues. Far from AI magically solving all of this, there still is major triage work and human judgment required to do the follow on work to actually protect systems. As a result, we’re about to enter a security engineer boom. Jevons paradox all over again.

2:05 AM · May 23, 2026 · 76.9K Views

12:18 PM · May 23, 2026 · 1.9K Views

QUOTE POST

#1598Carlos E. Perez@INTUITMACHINE

Carlos E. Perez@IntuitMachine

What Anthropic's Cybersecurity Report Accidentally Proves About the Future of Work 1/Anthropic just quietly published one of the most important reports about the future of AI and work. It's disguised as a cybersecurity update. But buried in the data is proof of something most people are getting wrong about AI replacing humans. Let me break it down. 🧵 2/ Project Glasswing gave their most powerful AI model (Mythos Preview) to ~50 major cybersecurity partners and told them: go find vulnerabilities in the world's most critical software. The result? Over 10,000 high- or critical-severity vulnerabilities found. Some partners saw a 10x increase in bug-finding speed. 3/ Here's the part nobody's talking about: Finding the bugs was the easy part. Anthropic's own words: "Progress on software security USED to be limited by how quickly we could find new vulnerabilities. Now it's limited by how quickly we can verify, disclose, and patch them." Read that again. 4/ The AI didn't create a verification surplus. It created a VERIFICATION CRISIS. The model finds thousands of vulnerabilities. But every single one still needs a human expert to: → Reproduce the bug in real software → Confirm it actually works → Assess how dangerous it really is → Write a detailed report → Design and deploy a patch 5/The numbers tell the story: • 1,752 findings carefully assessed by human experts • 90.6% turned out to be real vulnerabilities • But only 62.4% were as severe as the AI estimated The AI is great at finding real problems. It's mediocre at knowing exactly how bad they are. 6/This is the key insight that changes everything: There are THREE different kinds of "self-knowledge" an AI can have: CALIBRATION — "Am I right about X% of the time?" (AI is good at this) DISCRIMINATION — "Is THIS specific output right or wrong?" (AI is mediocre at this) EXPRESSION — "Can I tell you honestly when I'm uncertain?" (AI is bad at this) 7/ Most people collapse these into one thing: "How smart is the AI?" But they're completely independent. A model can know it's wrong 10% of the time (good calibration) while having NO IDEA which 10% is wrong (poor discrimination). That gap is why you still need humans. 8/ There's also a deep structural reason the AI can't close this gap on its own. There are two kinds of quality checks: INTERNAL: "Is the AI consistent with itself?" → Always possible. Just run it multiple times. EXTERNAL: "Does the output match reality?" → Only possible if you have access to reality. 9/The AI can check its own consistency all day long. But checking whether a vulnerability actually works in Cloudflare's production environment? In a bank's transaction system? In a hospital's network? That requires someone who KNOWS those systems. That's the structural gap that doesn't close with scale. 10/ Glasswing accidentally ran a perfect natural experiment proving this. SAME model. TWO different setups: Setup A: Cloudflare scans its OWN code with its OWN engineers evaluating. → 2,000 bugs found. False positive rate "better than human testers." Setup B: Anthropic scans open-source code with external security firms. → Maintainers overwhelmed. Some asked Anthropic to SLOW DOWN disclosures. 11/ Same AI. Wildly different results. The difference? Not the model. The WRAPPER around the model. When the people evaluating the output are the same people who built the system and know the domain deeply — everything moves faster and works better. When they're external? Bottleneck city. 12/ Enterprise customers using Claude Security patched 2,100 vulnerabilities in three weeks. Open-source maintainers? After months, only 75 patches deployed. Why? "Enterprises are fixing their own code, whereas open-source fixes usually require volunteer maintainers." Domain proximity is the multiplier. 13/ Here's the paradox nobody expects: As AI gets MORE capable, human judgment gets MORE valuable. Not less. A weak model produces obviously wrong code. Anyone can catch it. A strong model produces subtly wrong output that looks perfect — but fails in production under specific conditions only a domain expert would know. 14/ Mythos Preview found a vulnerability in wolfSSL — a security library used by BILLIONS of devices — that lets attackers forge certificates to impersonate banks. Verifying that finding required deep expertise in cryptographic protocols, certificate chains, and real-world deployment implications. The harder the bug, the harder the verification. 15/ The Glasswing team isn't trying to make the AI find MORE bugs. Every investment they describe is on the EVALUATION side: → 6 independent security research firms for triage → Partnership with Open Source Security Foundation → Harness tools shared with partners → Cyber Verification Program for security pros Generation is solved. Evaluation is the work. 16/ So what does this mean for the future of work? The bottleneck is shifting from "can you produce output?" to "can you tell if the output is actually good?" That requires: → Deep domain expertise → Access to real-world context → Cross-domain judgment → Tight feedback loops between discovery and action 17/ This is NOT just a cybersecurity story. If even in SOFTWARE — the domain with the best AI evaluation infrastructure, where code either compiles or doesn't, tests pass or fail — the verification bottleneck is this severe... Imagine legal. Finance. Healthcare. Where "correctness" is ambiguous and evaluation loops barely exist. 18/ The companies that will win are NOT the ones with the best models. Models are commoditizing fast. The winners will be the ones with the best WRAPPERS around the models: → Better calibration data for their domain → Closer proximity to ground truth → Tighter feedback loops between output and outcome → Smarter ways to surface AI uncertainty to human reviewers 19/ The implications for the job market are the opposite of what most people expect. AI doesn't eliminate the need for expertise. It amplifies its leverage. Every expert with domain knowledge becomes a force multiplier — the person who can look at sophisticated AI output and say "this is right" or "this will break in production." 20/ The Glasswing report shows us the future, and it looks like this: AI generates at superhuman speed. Humans evaluate with domain expertise. The bottleneck is always verification. The scarce resource is always judgment. The better AI gets at generation, the more it needs humans who can tell good from great from wrong. 21/ One more thing. Anthropic says models as capable as Mythos Preview "will soon be developed by many different AI companies." And right now "no company — including Anthropic — has developed safeguards strong enough to prevent such models from being misused." The verification crisis isn't coming. It's here. The question is whether we build the evaluation infrastructure fast enough. /end If this changed how you think about AI and work, share it. The loudest narrative is "AI replaces humans." The evidence says something more interesting: AI makes human judgment the most valuable resource in the system.

12:30 PM · May 23, 2026 · 3.4K Views

12:57 PM · May 23, 2026 · 455 Views

QUOTE POST

#1598Carlos E. Perez@INTUITMACHINE

@levie Related:

Carlos E. Perez@IntuitMachine

What Anthropic's Cybersecurity Report Accidentally Proves About the Future of Work 1/Anthropic just quietly published one of the most important reports about the future of AI and work. It's disguised as a cybersecurity update. But buried in the data is proof of something most people are getting wrong about AI replacing humans. Let me break it down. 🧵 2/ Project Glasswing gave their most powerful AI model (Mythos Preview) to ~50 major cybersecurity partners and told them: go find vulnerabilities in the world's most critical software. The result? Over 10,000 high- or critical-severity vulnerabilities found. Some partners saw a 10x increase in bug-finding speed. 3/ Here's the part nobody's talking about: Finding the bugs was the easy part. Anthropic's own words: "Progress on software security USED to be limited by how quickly we could find new vulnerabilities. Now it's limited by how quickly we can verify, disclose, and patch them." Read that again. 4/ The AI didn't create a verification surplus. It created a VERIFICATION CRISIS. The model finds thousands of vulnerabilities. But every single one still needs a human expert to: → Reproduce the bug in real software → Confirm it actually works → Assess how dangerous it really is → Write a detailed report → Design and deploy a patch 5/The numbers tell the story: • 1,752 findings carefully assessed by human experts • 90.6% turned out to be real vulnerabilities • But only 62.4% were as severe as the AI estimated The AI is great at finding real problems. It's mediocre at knowing exactly how bad they are. 6/This is the key insight that changes everything: There are THREE different kinds of "self-knowledge" an AI can have: CALIBRATION — "Am I right about X% of the time?" (AI is good at this) DISCRIMINATION — "Is THIS specific output right or wrong?" (AI is mediocre at this) EXPRESSION — "Can I tell you honestly when I'm uncertain?" (AI is bad at this) 7/ Most people collapse these into one thing: "How smart is the AI?" But they're completely independent. A model can know it's wrong 10% of the time (good calibration) while having NO IDEA which 10% is wrong (poor discrimination). That gap is why you still need humans. 8/ There's also a deep structural reason the AI can't close this gap on its own. There are two kinds of quality checks: INTERNAL: "Is the AI consistent with itself?" → Always possible. Just run it multiple times. EXTERNAL: "Does the output match reality?" → Only possible if you have access to reality. 9/The AI can check its own consistency all day long. But checking whether a vulnerability actually works in Cloudflare's production environment? In a bank's transaction system? In a hospital's network? That requires someone who KNOWS those systems. That's the structural gap that doesn't close with scale. 10/ Glasswing accidentally ran a perfect natural experiment proving this. SAME model. TWO different setups: Setup A: Cloudflare scans its OWN code with its OWN engineers evaluating. → 2,000 bugs found. False positive rate "better than human testers." Setup B: Anthropic scans open-source code with external security firms. → Maintainers overwhelmed. Some asked Anthropic to SLOW DOWN disclosures. 11/ Same AI. Wildly different results. The difference? Not the model. The WRAPPER around the model. When the people evaluating the output are the same people who built the system and know the domain deeply — everything moves faster and works better. When they're external? Bottleneck city. 12/ Enterprise customers using Claude Security patched 2,100 vulnerabilities in three weeks. Open-source maintainers? After months, only 75 patches deployed. Why? "Enterprises are fixing their own code, whereas open-source fixes usually require volunteer maintainers." Domain proximity is the multiplier. 13/ Here's the paradox nobody expects: As AI gets MORE capable, human judgment gets MORE valuable. Not less. A weak model produces obviously wrong code. Anyone can catch it. A strong model produces subtly wrong output that looks perfect — but fails in production under specific conditions only a domain expert would know. 14/ Mythos Preview found a vulnerability in wolfSSL — a security library used by BILLIONS of devices — that lets attackers forge certificates to impersonate banks. Verifying that finding required deep expertise in cryptographic protocols, certificate chains, and real-world deployment implications. The harder the bug, the harder the verification. 15/ The Glasswing team isn't trying to make the AI find MORE bugs. Every investment they describe is on the EVALUATION side: → 6 independent security research firms for triage → Partnership with Open Source Security Foundation → Harness tools shared with partners → Cyber Verification Program for security pros Generation is solved. Evaluation is the work. 16/ So what does this mean for the future of work? The bottleneck is shifting from "can you produce output?" to "can you tell if the output is actually good?" That requires: → Deep domain expertise → Access to real-world context → Cross-domain judgment → Tight feedback loops between discovery and action 17/ This is NOT just a cybersecurity story. If even in SOFTWARE — the domain with the best AI evaluation infrastructure, where code either compiles or doesn't, tests pass or fail — the verification bottleneck is this severe... Imagine legal. Finance. Healthcare. Where "correctness" is ambiguous and evaluation loops barely exist. 18/ The companies that will win are NOT the ones with the best models. Models are commoditizing fast. The winners will be the ones with the best WRAPPERS around the models: → Better calibration data for their domain → Closer proximity to ground truth → Tighter feedback loops between output and outcome → Smarter ways to surface AI uncertainty to human reviewers 19/ The implications for the job market are the opposite of what most people expect. AI doesn't eliminate the need for expertise. It amplifies its leverage. Every expert with domain knowledge becomes a force multiplier — the person who can look at sophisticated AI output and say "this is right" or "this will break in production." 20/ The Glasswing report shows us the future, and it looks like this: AI generates at superhuman speed. Humans evaluate with domain expertise. The bottleneck is always verification. The scarce resource is always judgment. The better AI gets at generation, the more it needs humans who can tell good from great from wrong. 21/ One more thing. Anthropic says models as capable as Mythos Preview "will soon be developed by many different AI companies." And right now "no company — including Anthropic — has developed safeguards strong enough to prevent such models from being misused." The verification crisis isn't coming. It's here. The question is whether we build the evaluation infrastructure fast enough. /end If this changed how you think about AI and work, share it. The loudest narrative is "AI replaces humans." The evidence says something more interesting: AI makes human judgment the most valuable resource in the system.

12:30 PM · May 23, 2026 · 3.4K Views

12:57 PM · May 23, 2026 · 212 Views