Simbian logo on black background

AI Models Struggle to Defend Against Cyberattacks

A new benchmark reveals that while frontier language models excel at exploitation, they fail to autonomously detect sophisticated attack chains.

Frontier large language models are proficient at finding and exploiting software vulnerabilities, but a new study shows they are currently unable to defend against them without significant assistance.

Simbian Research Lab released its Cyber Defense Benchmark today, testing 11 prominent LLMs on their ability to detect MITRE ATT&CK chains within realistic telemetry. The results indicate a significant gap between offensive and defensive capabilities in artificial intelligence.

None of the tested models earned a passing score. Anthropic’s Claude Opus 4.6 emerged as the top performer, yet it only detected an average of 46% of attack evidence per MITRE tactic. On its strongest tactic, Resource Development, the model scored 63%. However, its performance plummeted to 25% in the Collection category.

The study found that defensive AI tasks are structurally more difficult than offensive ones. While offense has a clear "win state," such as gaining root access, defense requires reasoning across noisy, partial evidence without knowing the total number of malicious events present.

Cost does not linearly correlate with success. While Claude Opus 4.6 found three times more flags than Google Gemini 3 Flash, the investigation cost roughly 100 times more per run. Mid-priced models, including GPT-5 and Gemini 3.1 Pro, plateaued around a 2% detection rate, often ceasing investigations prematurely because the agent believed the task was complete.

Researchers noted that the raw reasoning power of an LLM is only one component of a security solution. To reach enterprise-level accuracy, models require a "harness"—a specialized framework providing organizational context, deterministic retrieval and structured investigation loops.

The benchmark utilized real Windows telemetry, including Sysmon and Security event logs, mutated with randomized hostnames and IP addresses to prevent models from relying on memorized data. This differs from previous industry benchmarks that relied on multiple-choice questions or capture-the-flag puzzles.

The findings suggest that for security operations centers, the integration of an AI model into a sophisticated agentic platform is more critical than the specific model being used.

About the Author

Jesse Jacobs is assistant editor of SecurityToday.com.

Featured

New Products

  • PE80 Series

    PE80 Series by SARGENT / ED4000/PED5000 Series by Corbin Russwin

    ASSA ABLOY, a global leader in access solutions, has announced the launch of two next generation exit devices from long-standing leaders in the premium exit device market: the PE80 Series by SARGENT and the PED4000/PED5000 Series by Corbin Russwin. These new exit devices boast industry-first features that are specifically designed to provide enhanced safety, security and convenience, setting new standards for exit solutions. The SARGENT PE80 and Corbin Russwin PED4000/PED5000 Series exit devices are engineered to meet the ever-evolving needs of modern buildings. Featuring the high strength, security and durability that ASSA ABLOY is known for, the new exit devices deliver several innovative, industry-first features in addition to elegant design finishes for every opening.

  • Camden CV-7600 High Security Card Readers

    Camden CV-7600 High Security Card Readers

    Camden Door Controls has relaunched its CV-7600 card readers in response to growing market demand for a more secure alternative to standard proximity credentials that can be easily cloned. CV-7600 readers support MIFARE DESFire EV1 & EV2 encryption technology credentials, making them virtually clone-proof and highly secure.

  • ResponderLink

    ResponderLink

    Shooter Detection Systems (SDS), an Alarm.com company and a global leader in gunshot detection solutions, has introduced ResponderLink, a groundbreaking new 911 notification service for gunshot events. ResponderLink completes the circle from detection to 911 notification to first responder awareness, giving law enforcement enhanced situational intelligence they urgently need to save lives. Integrating SDS’s proven gunshot detection system with Noonlight’s SendPolice platform, ResponderLink is the first solution to automatically deliver real-time gunshot detection data to 911 call centers and first responders. When shots are detected, the 911 dispatching center, also known as the Public Safety Answering Point or PSAP, is contacted based on the gunfire location, enabling faster initiation of life-saving emergency protocols.