Simbian logo on black background

AI Models Struggle to Defend Against Cyberattacks

A new benchmark reveals that while frontier language models excel at exploitation, they fail to autonomously detect sophisticated attack chains.

Frontier large language models are proficient at finding and exploiting software vulnerabilities, but a new study shows they are currently unable to defend against them without significant assistance.

Simbian Research Lab released its Cyber Defense Benchmark today, testing 11 prominent LLMs on their ability to detect MITRE ATT&CK chains within realistic telemetry. The results indicate a significant gap between offensive and defensive capabilities in artificial intelligence.

None of the tested models earned a passing score. Anthropic’s Claude Opus 4.6 emerged as the top performer, yet it only detected an average of 46% of attack evidence per MITRE tactic. On its strongest tactic, Resource Development, the model scored 63%. However, its performance plummeted to 25% in the Collection category.

The study found that defensive AI tasks are structurally more difficult than offensive ones. While offense has a clear "win state," such as gaining root access, defense requires reasoning across noisy, partial evidence without knowing the total number of malicious events present.

Cost does not linearly correlate with success. While Claude Opus 4.6 found three times more flags than Google Gemini 3 Flash, the investigation cost roughly 100 times more per run. Mid-priced models, including GPT-5 and Gemini 3.1 Pro, plateaued around a 2% detection rate, often ceasing investigations prematurely because the agent believed the task was complete.

Researchers noted that the raw reasoning power of an LLM is only one component of a security solution. To reach enterprise-level accuracy, models require a "harness"—a specialized framework providing organizational context, deterministic retrieval and structured investigation loops.

The benchmark utilized real Windows telemetry, including Sysmon and Security event logs, mutated with randomized hostnames and IP addresses to prevent models from relying on memorized data. This differs from previous industry benchmarks that relied on multiple-choice questions or capture-the-flag puzzles.

The findings suggest that for security operations centers, the integration of an AI model into a sophisticated agentic platform is more critical than the specific model being used.

About the Author

Jesse Jacobs is assistant editor of SecurityToday.com.

Featured

New Products

  • Luma x20

    Luma x20

    Snap One has announced its popular Luma x20 family of surveillance products now offers even greater security and privacy for home and business owners across the globe by giving them full control over integrators’ system access to view live and recorded video. According to Snap One Product Manager Derek Webb, the new “customer handoff” feature provides enhanced user control after initial installation, allowing the owners to have total privacy while also making it easy to reinstate integrator access when maintenance or assistance is required. This new feature is now available to all Luma x20 users globally. “The Luma x20 family of surveillance solutions provides excellent image and audio capture, and with the new customer handoff feature, it now offers absolute privacy for camera feeds and recordings,” Webb said. “With notifications and integrator access controlled through the powerful OvrC remote system management platform, it’s easy for integrators to give their clients full control of their footage and then to get temporary access from the client for any troubleshooting needs.”

  • Mobile Safe Shield

    Mobile Safe Shield

    SafeWood Designs, Inc., a manufacturer of patented bullet resistant products, is excited to announce the launch of the Mobile Safe Shield. The Mobile Safe Shield is a moveable bullet resistant shield that provides protection in the event of an assailant and supplies cover in the event of an active shooter. With a heavy-duty steel frame, quality castor wheels, and bullet resistant core, the Mobile Safe Shield is a perfect addition to any guard station, security desks, courthouses, police stations, schools, office spaces and more. The Mobile Safe Shield is incredibly customizable. Bullet resistant materials are available in UL 752 Levels 1 through 8 and include glass, white board, tack board, veneer, and plastic laminate. Flexibility in bullet resistant materials allows for the Mobile Safe Shield to blend more with current interior décor for a seamless design aesthetic. Optional custom paint colors are also available for the steel frame.

  • Compact IP Video Intercom

    Viking’s X-205 Series of intercoms provide HD IP video and two-way voice communication - all wrapped up in an attractive compact chassis.