AI-Generated Code Poses Major Security Risks in Nearly Half of All Development Tasks
Veracode, a provider of application risk management, recently unveiled its 2025 GenAI Code Security Report, revealing critical security flaws in AI-generated code. The study analyzed 80 curated coding tasks across more than 100 large language models (LLMs), revealing that while AI produces functional code, it introduces security vulnerabilities in 45 percent of cases.
The research demonstrates a troubling pattern: when given a choice between a secure and insecure method to write code, GenAI models chose the insecure option 45 percent of the time. Perhaps more concerning, Veracode’s research also uncovered a critical trend: despite advances in LLMs’ ability to generate syntactically correct code, security performance has not kept up, remaining unchanged over time.
“The rise of vibe coding, where developers rely on AI to generate code, typically without explicitly defining security requirements, represents a fundamental shift in how software is built,” said Jens Wessling, Chief Technology Officer at Veracode. “The main concern with this trend is that they do not need to specify security constraints to get the code they want, effectively leaving secure coding decisions to LLMs. Our research reveals GenAI models make the wrong choices nearly half the time, and it’s not improving.”
AI is enabling attackers to identify and exploit security vulnerabilities quicker and more effectively. Tools powered by AI can scan systems at scale, identify weaknesses, and even generate exploit code with minimal human input. This lowers the barrier to entry for less-skilled attackers and increases the speed and sophistication of attacks, posing a significant threat to traditional security defenses. Not only are vulnerabilities increasing, but the ability to exploit them is becoming easier.
LLMs Introduce Dangerous Levels of Common Security Vulnerabilities
To evaluate the security properties of LLM-generated code, Veracode designed a set of 80 code completion tasks with known potential for security vulnerabilities according to the MITRE Common Weakness Enumeration (CWE) system, a standard classification of software weaknesses that can turn into vulnerabilities. The tasks prompted more than 100 LLMs to auto-complete a block of code in a secure or insecure manner, which the research team then analyzed using Veracode Static Analysis. In 45 percent of all test cases, LLMs introduced vulnerabilities classified within the OWASP (Open Web Application Security Project) Top 10—the most critical web application security risks.
Veracode found Java to be the riskiest language for AI code generation, with a security failure rate over 70 percent. Other major languages, like Python, C#, and JavaScript, still presented significant risk, with failure rates between 38 percent and 45 percent. The research also revealed LLMs failed to secure code against cross-site scripting (CWE-80) and log injection (CWE-117) in 86 percent and 88 percent of cases, respectively.
“Despite the advances in AI-assisted development, it is clear security hasn’t kept pace,” Wessling said. “Our research shows models are getting better at coding accurately but are not improving at security. We also found larger models do not perform significantly better than smaller models, suggesting this is a systemic issue rather than an LLM scaling problem.”