Simbian logo on black background

AI Models Struggle to Defend Against Cyberattacks

A new benchmark reveals that while frontier language models excel at exploitation, they fail to autonomously detect sophisticated attack chains.

Frontier large language models are proficient at finding and exploiting software vulnerabilities, but a new study shows they are currently unable to defend against them without significant assistance.

Simbian Research Lab released its Cyber Defense Benchmark today, testing 11 prominent LLMs on their ability to detect MITRE ATT&CK chains within realistic telemetry. The results indicate a significant gap between offensive and defensive capabilities in artificial intelligence.

None of the tested models earned a passing score. Anthropic’s Claude Opus 4.6 emerged as the top performer, yet it only detected an average of 46% of attack evidence per MITRE tactic. On its strongest tactic, Resource Development, the model scored 63%. However, its performance plummeted to 25% in the Collection category.

The study found that defensive AI tasks are structurally more difficult than offensive ones. While offense has a clear "win state," such as gaining root access, defense requires reasoning across noisy, partial evidence without knowing the total number of malicious events present.

Cost does not linearly correlate with success. While Claude Opus 4.6 found three times more flags than Google Gemini 3 Flash, the investigation cost roughly 100 times more per run. Mid-priced models, including GPT-5 and Gemini 3.1 Pro, plateaued around a 2% detection rate, often ceasing investigations prematurely because the agent believed the task was complete.

Researchers noted that the raw reasoning power of an LLM is only one component of a security solution. To reach enterprise-level accuracy, models require a "harness"—a specialized framework providing organizational context, deterministic retrieval and structured investigation loops.

The benchmark utilized real Windows telemetry, including Sysmon and Security event logs, mutated with randomized hostnames and IP addresses to prevent models from relying on memorized data. This differs from previous industry benchmarks that relied on multiple-choice questions or capture-the-flag puzzles.

The findings suggest that for security operations centers, the integration of an AI model into a sophisticated agentic platform is more critical than the specific model being used.

About the Author

Jesse Jacobs is assistant editor of SecurityToday.com.

Featured

New Products

  • AC Nio

    AC Nio

    Aiphone, a leading international manufacturer of intercom, access control, and emergency communication products, has introduced the AC Nio, its access control management software, an important addition to its new line of access control solutions.

  • PE80 Series

    PE80 Series by SARGENT / ED4000/PED5000 Series by Corbin Russwin

    ASSA ABLOY, a global leader in access solutions, has announced the launch of two next generation exit devices from long-standing leaders in the premium exit device market: the PE80 Series by SARGENT and the PED4000/PED5000 Series by Corbin Russwin. These new exit devices boast industry-first features that are specifically designed to provide enhanced safety, security and convenience, setting new standards for exit solutions. The SARGENT PE80 and Corbin Russwin PED4000/PED5000 Series exit devices are engineered to meet the ever-evolving needs of modern buildings. Featuring the high strength, security and durability that ASSA ABLOY is known for, the new exit devices deliver several innovative, industry-first features in addition to elegant design finishes for every opening.

  • Camden CM-221 Series Switches

    Camden CM-221 Series Switches

    Camden Door Controls is pleased to announce that, in response to soaring customer demand, it has expanded its range of ValueWave™ no-touch switches to include a narrow (slimline) version with manual override. This override button is designed to provide additional assurance that the request to exit switch will open a door, even if the no-touch sensor fails to operate. This new slimline switch also features a heavy gauge stainless steel faceplate, a red/green illuminated light ring, and is IP65 rated, making it ideal for indoor or outdoor use as part of an automatic door or access control system. ValueWave™ no-touch switches are designed for easy installation and trouble-free service in high traffic applications. In addition to this narrow version, the CM-221 & CM-222 Series switches are available in a range of other models with single and double gang heavy-gauge stainless steel faceplates and include illuminated light rings.