Simbian logo on black background

AI Models Struggle to Defend Against Cyberattacks

A new benchmark reveals that while frontier language models excel at exploitation, they fail to autonomously detect sophisticated attack chains.

Frontier large language models are proficient at finding and exploiting software vulnerabilities, but a new study shows they are currently unable to defend against them without significant assistance.

Simbian Research Lab released its Cyber Defense Benchmark today, testing 11 prominent LLMs on their ability to detect MITRE ATT&CK chains within realistic telemetry. The results indicate a significant gap between offensive and defensive capabilities in artificial intelligence.

None of the tested models earned a passing score. Anthropic’s Claude Opus 4.6 emerged as the top performer, yet it only detected an average of 46% of attack evidence per MITRE tactic. On its strongest tactic, Resource Development, the model scored 63%. However, its performance plummeted to 25% in the Collection category.

The study found that defensive AI tasks are structurally more difficult than offensive ones. While offense has a clear "win state," such as gaining root access, defense requires reasoning across noisy, partial evidence without knowing the total number of malicious events present.

Cost does not linearly correlate with success. While Claude Opus 4.6 found three times more flags than Google Gemini 3 Flash, the investigation cost roughly 100 times more per run. Mid-priced models, including GPT-5 and Gemini 3.1 Pro, plateaued around a 2% detection rate, often ceasing investigations prematurely because the agent believed the task was complete.

Researchers noted that the raw reasoning power of an LLM is only one component of a security solution. To reach enterprise-level accuracy, models require a "harness"—a specialized framework providing organizational context, deterministic retrieval and structured investigation loops.

The benchmark utilized real Windows telemetry, including Sysmon and Security event logs, mutated with randomized hostnames and IP addresses to prevent models from relying on memorized data. This differs from previous industry benchmarks that relied on multiple-choice questions or capture-the-flag puzzles.

The findings suggest that for security operations centers, the integration of an AI model into a sophisticated agentic platform is more critical than the specific model being used.

About the Author

Jesse Jacobs is assistant editor of SecurityToday.com.

Featured

New Products

  • EasyGate SPT and SPD

    EasyGate SPT SPD

    Security solutions do not have to be ordinary, let alone unattractive. Having renewed their best-selling speed gates, Cominfo has once again demonstrated their Art of Security philosophy in practice — and confirmed their position as an industry-leading manufacturers of premium speed gates and turnstiles.

  • Camden CV-7600 High Security Card Readers

    Camden CV-7600 High Security Card Readers

    Camden Door Controls has relaunched its CV-7600 card readers in response to growing market demand for a more secure alternative to standard proximity credentials that can be easily cloned. CV-7600 readers support MIFARE DESFire EV1 & EV2 encryption technology credentials, making them virtually clone-proof and highly secure.

  • A8V MIND

    A8V MIND

    Hexagon’s Geosystems presents a portable version of its Accur8vision detection system. A rugged all-in-one solution, the A8V MIND (Mobile Intrusion Detection) is designed to provide flexible protection of critical outdoor infrastructure and objects. Hexagon’s Accur8vision is a volumetric detection system that employs LiDAR technology to safeguard entire areas. Whenever it detects movement in a specified zone, it automatically differentiates a threat from a nonthreat, and immediately notifies security staff if necessary. Person detection is carried out within a radius of 80 meters from this device. Connected remotely via a portable computer device, it enables remote surveillance and does not depend on security staff patrolling the area.