Clustering Attacks on Web Apps to Find the Real Story Behind the Headlines

Clustering Attacks on Web Apps to Find the Real Story Behind the Headlines

Our goal of clustering attacks on web applications is two-fold

Security products which aim to block attacks may do their job perfectly while also reporting the attacks that are blocked. However, one of the biggest problems in the cyber security arena today is alert fatigue, where there are too many alerts to manually process. The largest data breach in history  affecting more than 41 million customer payment card accounts could have been prevented if the right action from the visible security alerts were taken. A web site protected by a web application firewall may be targeted by anywhere from hundreds of thousands of attacks to millions of attacks in a single day.

The amount of manpower given for processing and analyzing these alerts is always not enough, and the result is a flood of important data which is not handled and analyzed due to the alert fatigue. However, by leveraging artificial intelligence, we can develop sophisticated machine learning algorithms to cluster alerts to automate and consolidate those alerts, condensing days or weeks of work into minutes.

Our goal of clustering attacks on web applications is two-fold:

  1. Highlight interesting patterns inside the attacks
  2. Distill the massive amount of attacks to a few actionable incidents

Clustering can help us create a “story” out of the attacks (naming them based on behavior), making them more easily understood to a human observer and easier to analyze. For example, when seeing a cluster called, “SQL injection attack from several IPs in China using a Havij scanner,” the story behind it is much clearer than analyzing the thousands of attacks this cluster contains and trying to find the common pattern between them.

A simple “group-by” algorithm that takes the alerts and groups them by a specific attribute is not good enough. The reason is that there is not a single structure for attacks, and there is no single attribute which can define all the attacks. Thus, a more sophisticated algorithm, which considers a general distance function between attacks on web applications, is needed.

The algorithm has three main stages:

  1. Feature extraction
  2. Distance calculation
  3. Clustering of the attacks

Feature extraction

The raw data that enters the algorithm is an HTTP request that contains an attack stopped by the firewall, with some additional fields containing more data about the attack, like the source IP and the type of attack.

By leveraging our web application security domain knowledge, we extract additional meaningful features from the raw data that can help us describe the attack.

For example, the source of an attack is not defined solely by the IP. We also use geolocation services to extract more the about the origin of the IP, like source country, ISP, coordinates, ASN etc. It is also useful to know whether this IP comes from some kind of anonymity framework like TOR or an anonymous proxy.

Distance Calculations

The next task is to determine a way to calculate the distance between two attacks. This is a core stage of the algorithm as it determines when two attacks are similar, which in general is what the algorithm is trying to achieve. Calculating a distance between two points in the plane is easy – there is a precise formula to do it – but how can we calculate the distance between two URLs or two IPs?

We need to find a method to calculate the distance for every meaningful feature we have in our data, and then combine all these distances to find a single measure between two attacks.

Clustering of the Attacks

The final step is to take the data with all the extracted features and the distance measure between attacks to construct clusters of the attacks. In our case, we used a streaming clustering algorithm. This algorithm creates the clusters over time by receiving a stream of data as more and more attacks enter the system.

The importance of clustering in streaming mode is that the attacks are being delivered in real time. This method of stream clustering helps the performance of the algorithms in both time and memory, as not all the attack data is stored in memory all the time, only the current clusters with their unique features.

To conclude, clustering attacks on web applications help to understand the hidden patterns behind the attacks and to make huge amounts of data comprehendible to the human security expert. Constructing such a clustering algorithm requires more than just machine learning knowledge, it requires a high level of domain knowledge in cyber security to understand and construct the various parts of the algorithm.

To read more about clustering of attack on web applications see Imperva’s blog series.

Featured

  • Security Today Announces The Govies Government Security Award Winners for 2025

    Security Today is pleased to announce the 2025 winners in The Govies Government Security Awards. The awards honor outstanding government security products in a variety of categories. Read Now

  • Survey: 60 Percent of Organizations Using AI in IT Infrastructure

    Netwrix, a cybersecurity provider focused on data and identity threats, today announced the release of its annual global 2025 Cybersecurity Trends Report based on a global survey of 2,150 IT and security professionals from 121 countries. It reveals that 60% of organizations are already using artificial intelligence (AI) in their IT infrastructure and 30% are considering implementing AI. Read Now

  • New Research Reveals Global Video Surveillance Industry Perspectives on AI

    Axis Communications, the global industry leader in video surveillance, has released its latest research report, ‘The State of AI in Video Surveillance,’ which explores global industry perspectives on the use of AI in the security industry and beyond. The report reveals current attitudes on AI technologies thanks to in-depth interviews with AI experts from Axis’ global network and a comprehensive survey of more than 5,800 respondents, including distributors, channel partners, and end customers across 68 countries. The resulting insights cover AI integration and the opportunities and challenges that exist with regard to security, safety, business intelligence, and operational efficiency. Read Now

  • SIA Urges Tariff Relief for Security Industry Products

    Today, the Security Industry Association has sent a letter to U.S. Trade Representative Jamieson Greer and U.S. Secretary of Commerce Howard Lutnick requesting relief from tariffs for security industry products and asking that the Trump administration formulate a process that allows companies to apply for product-specific exemptions. The security industry is an important segment of the U.S. economy, contributing over $430 billion in total economic impact and supporting over 2.1 million jobs. Read Now

  • Report Shows Cybercriminals Continue Pivot to Stealthier Tactics

    IBM recently released the 2025 X-Force Threat Intelligence Index highlighting that cybercriminals continued to pivot to stealthier tactics, with lower-profile credential theft spiking, while ransomware attacks on enterprises declined. IBM X-Force observed an 84% increase in emails delivering infostealers in 2024 compared to the prior year, a method threat actors relied heavily on to scale identity attacks. Read Now

New Products

  • EasyGate SPT and SPD

    EasyGate SPT SPD

    Security solutions do not have to be ordinary, let alone unattractive. Having renewed their best-selling speed gates, Cominfo has once again demonstrated their Art of Security philosophy in practice — and confirmed their position as an industry-leading manufacturers of premium speed gates and turnstiles.

  • Connect ONE’s powerful cloud-hosted management platform provides the means to tailor lockdowns and emergency mass notifications throughout a facility – while simultaneously alerting occupants to hazards or next steps, like evacuation.

    Connect ONE®

    Connect ONE’s powerful cloud-hosted management platform provides the means to tailor lockdowns and emergency mass notifications throughout a facility – while simultaneously alerting occupants to hazards or next steps, like evacuation.

  • Mobile Safe Shield

    Mobile Safe Shield

    SafeWood Designs, Inc., a manufacturer of patented bullet resistant products, is excited to announce the launch of the Mobile Safe Shield. The Mobile Safe Shield is a moveable bullet resistant shield that provides protection in the event of an assailant and supplies cover in the event of an active shooter. With a heavy-duty steel frame, quality castor wheels, and bullet resistant core, the Mobile Safe Shield is a perfect addition to any guard station, security desks, courthouses, police stations, schools, office spaces and more. The Mobile Safe Shield is incredibly customizable. Bullet resistant materials are available in UL 752 Levels 1 through 8 and include glass, white board, tack board, veneer, and plastic laminate. Flexibility in bullet resistant materials allows for the Mobile Safe Shield to blend more with current interior décor for a seamless design aesthetic. Optional custom paint colors are also available for the steel frame.