AI Models Struggle to Defend Against Cyberattacks -- Security Today

AI Models Struggle to Defend Against Cyberattacks

A new benchmark reveals that while frontier language models excel at exploitation, they fail to autonomously detect sophisticated attack chains.

By Jesse Jacobs
Apr 28, 2026

Frontier large language models are proficient at finding and exploiting software vulnerabilities, but a new study shows they are currently unable to defend against them without significant assistance.

Simbian Research Lab released its Cyber Defense Benchmark today, testing 11 prominent LLMs on their ability to detect MITRE ATT&CK chains within realistic telemetry. The results indicate a significant gap between offensive and defensive capabilities in artificial intelligence.

None of the tested models earned a passing score. Anthropic’s Claude Opus 4.6 emerged as the top performer, yet it only detected an average of 46% of attack evidence per MITRE tactic. On its strongest tactic, Resource Development, the model scored 63%. However, its performance plummeted to 25% in the Collection category.

The study found that defensive AI tasks are structurally more difficult than offensive ones. While offense has a clear "win state," such as gaining root access, defense requires reasoning across noisy, partial evidence without knowing the total number of malicious events present.

Cost does not linearly correlate with success. While Claude Opus 4.6 found three times more flags than Google Gemini 3 Flash, the investigation cost roughly 100 times more per run. Mid-priced models, including GPT-5 and Gemini 3.1 Pro, plateaued around a 2% detection rate, often ceasing investigations prematurely because the agent believed the task was complete.

Researchers noted that the raw reasoning power of an LLM is only one component of a security solution. To reach enterprise-level accuracy, models require a "harness"—a specialized framework providing organizational context, deterministic retrieval and structured investigation loops.

The benchmark utilized real Windows telemetry, including Sysmon and Security event logs, mutated with randomized hostnames and IP addresses to prevent models from relying on memorized data. This differs from previous industry benchmarks that relied on multiple-choice questions or capture-the-flag puzzles.

The findings suggest that for security operations centers, the integration of an AI model into a sophisticated agentic platform is more critical than the specific model being used.

About the Author

Jesse Jacobs is assistant editor of SecurityToday.com.

Featured

Interface Systems Expands Cloud Access Control Options With Brivo

Commercial security customers gain central cloud management for door activity credentials and schedules across multiple locations. Read Now
Why Embedded Identity Protection Is Overtaking Standalone Solutions

Standalone identity theft monitoring tools are falling out of favor as consumers reject feature bloat and extra logins in favor of security native to their existing financial and insurance apps. Read Now
Tulsa Deploys Solar Powered Security Cameras Across Urban Wilderness

A new off-grid surveillance network connects remote trailheads directly to law enforcement after public safety concerns at Turkey Mountain. Read Now
Miami Fire Rescue Deploys Tech for 2026 FIFA World Cup

Miami-Dade Fire Rescue is using Blackline Safety cloud-connected monitors to track potential air hazards at FIFA World Cup matches. Read Now
SoFi Stadium Unifies Security Network With New Platform

The 300-acre Hollywood Park campus integrated its video surveillance, access control and license plate recognition into a single system. Read Now
- Access Control
- Video Surveillance
- Physical Security

Artificial Intelligence

New Products

4K Video Decoder

3xLOGIC’s VH-DECODER-4K is perfect for use in organizations of all sizes in diverse vertical sectors such as retail, leisure and hospitality, education and commercial premises.
StarLink Fire Max2 Dual Cell/IP Communicator

Streamline commercial fire compliance with dual-carrier cellular connectivity, a dedicated FACP data path, and dual-layer electronic inspection verification.
HD2055 Modular Barricade

Delta Scientific’s electric HD2055 modular shallow foundation barricade is tested to ASTM M50/P1 with negative penetration from the vehicle upon impact. With a shallow foundation of only 24 inches, the HD2055 can be installed without worrying about buried power lines and other below grade obstructions. The modular make-up of the barrier also allows you to cover wider roadways by adding additional modules to the system. The HD2055 boasts an Emergency Fast Operation of 1.5 seconds giving the guard ample time to deploy under a high threat situation.

Security Today eNews

Sign up today for essential industry news and product information that can help you stay afloat in the fast-paced world of security.

Email Address*Country*

Please type the letters/numbers you see above.