Clustering Attacks on Web Apps to Find the Real Story Behind the Headlines -- Security Today

Clustering Attacks on Web Apps to Find the Real Story Behind the Headlines

Our goal of clustering attacks on web applications is two-fold

By Gilad Yehudai
Jul 11, 2018

Security products which aim to block attacks may do their job perfectly while also reporting the attacks that are blocked. However, one of the biggest problems in the cyber security arena today is alert fatigue, where there are too many alerts to manually process. The largest data breach in history affecting more than 41 million customer payment card accounts could have been prevented if the right action from the visible security alerts were taken. A web site protected by a web application firewall may be targeted by anywhere from hundreds of thousands of attacks to millions of attacks in a single day.

The amount of manpower given for processing and analyzing these alerts is always not enough, and the result is a flood of important data which is not handled and analyzed due to the alert fatigue. However, by leveraging artificial intelligence, we can develop sophisticated machine learning algorithms to cluster alerts to automate and consolidate those alerts, condensing days or weeks of work into minutes.

Our goal of clustering attacks on web applications is two-fold:

Highlight interesting patterns inside the attacks
Distill the massive amount of attacks to a few actionable incidents

Clustering can help us create a “story” out of the attacks (naming them based on behavior), making them more easily understood to a human observer and easier to analyze. For example, when seeing a cluster called, “SQL injection attack from several IPs in China using a Havij scanner,” the story behind it is much clearer than analyzing the thousands of attacks this cluster contains and trying to find the common pattern between them.

A simple “group-by” algorithm that takes the alerts and groups them by a specific attribute is not good enough. The reason is that there is not a single structure for attacks, and there is no single attribute which can define all the attacks. Thus, a more sophisticated algorithm, which considers a general distance function between attacks on web applications, is needed.

The algorithm has three main stages:

Feature extraction
Distance calculation
Clustering of the attacks

Feature extraction

The raw data that enters the algorithm is an HTTP request that contains an attack stopped by the firewall, with some additional fields containing more data about the attack, like the source IP and the type of attack.

By leveraging our web application security domain knowledge, we extract additional meaningful features from the raw data that can help us describe the attack.

For example, the source of an attack is not defined solely by the IP. We also use geolocation services to extract more the about the origin of the IP, like source country, ISP, coordinates, ASN etc. It is also useful to know whether this IP comes from some kind of anonymity framework like TOR or an anonymous proxy.

Distance Calculations

The next task is to determine a way to calculate the distance between two attacks. This is a core stage of the algorithm as it determines when two attacks are similar, which in general is what the algorithm is trying to achieve. Calculating a distance between two points in the plane is easy – there is a precise formula to do it – but how can we calculate the distance between two URLs or two IPs?

We need to find a method to calculate the distance for every meaningful feature we have in our data, and then combine all these distances to find a single measure between two attacks.

Clustering of the Attacks

The final step is to take the data with all the extracted features and the distance measure between attacks to construct clusters of the attacks. In our case, we used a streaming clustering algorithm. This algorithm creates the clusters over time by receiving a stream of data as more and more attacks enter the system.

The importance of clustering in streaming mode is that the attacks are being delivered in real time. This method of stream clustering helps the performance of the algorithms in both time and memory, as not all the attack data is stored in memory all the time, only the current clusters with their unique features.

To conclude, clustering attacks on web applications help to understand the hidden patterns behind the attacks and to make huge amounts of data comprehendible to the human security expert. Constructing such a clustering algorithm requires more than just machine learning knowledge, it requires a high level of domain knowledge in cyber security to understand and construct the various parts of the algorithm.

To read more about clustering of attack on web applications see Imperva’s blog series.

PSA TEC 2024 Runs May 13-17 in Dallas
04/24/2024
Hanwha Vision Announces Support for Genetec Security Center SaaS Unified Solution
04/24/2024
National Monitoring Center Secures Updated TMA Installation Quality (IQ) Certification
04/24/2024
ISC West 2024 Welcomed More Than 29,000 Attendees
04/17/2024
Evolon Introduces AI-Powered Video Monitoring Service 
04/17/2024
Resideo to Acquire Snap One to Expand Presence in Smart Living Products and Distribution
04/15/2024
Allegion US Features Interoperability, Mobile Credentials and More at ISC West
04/10/2024
Altronix Showcases Products to Protect Critical Infrastructure at ISC West
04/10/2024

Featured

Perimeter Security Standards for Multi-Site Businesses

When you run or own a business that has multiple locations, it is important to set clear perimeter security standards. By doing this, it allows you to assess and mitigate any potential threats or risks at each site or location efficiently and effectively. Read Now
New Research Shows a Continuing Increase in Ransomware Victims

GuidePoint Security recently announced the release of GuidePoint Research and Intelligence Team’s (GRIT) Q1 2024 Ransomware Report. In addition to revealing a nearly 20% year-over-year increase in the number of ransomware victims, the GRIT Q1 2024 Ransomware Report observes major shifts in the behavioral patterns of ransomware groups following law enforcement activity – including the continued targeting of previously “off-limits” organizations and industries, such as emergency hospitals. Read Now
- Cybersecurity
OpenAI's GPT-4 Is Capable of Autonomously Exploiting Zero-Day Vulnerabilities

According to a new study from four computer scientists at the University of Illinois Urbana-Champaign, OpenAI’s paid chatbot, GPT-4, is capable of autonomously exploiting zero-day vulnerabilities without any human assistance. Read Now
- Cybersecurity
Getting in Someone’s Face

There was a time, not so long ago, when the tradeshow industry must have thought COVID-19 might wipe out face-to-face meetings. It sure seemed that way about three years ago. Read Now
- Industry Events
- ISC West

Webinars

Hanwha Vision (formerly Hanwha Techwin), Salient Systems, Eagle Eye Networks Inc, Evolv Technology, Arcules, ZeroEyes

Shooter Detection: The First Line of Defense
Hanwha Vision (formerly Hanwha Techwin), Salient Systems, Arcules

Embracing AI-driven Access Control
HID Global

Unlock the Potential: Why Make the Shift to Mobile Credentials

Whitepapers

Genetec

How to Modernize Your Existing Security Systems Using Hybrid-Cloud
Eagle Eye Networks Inc

Are you ready for an on-site emergency? A practical checklist
Winsted Corporation

Top Considerations Designing an Ergonomic Control Room

New Products

EasyGate SPT SPD

Security solutions do not have to be ordinary, let alone unattractive. Having renewed their best-selling speed gates, Cominfo has once again demonstrated their Art of Security philosophy in practice — and confirmed their position as an industry-leading manufacturers of premium speed gates and turnstiles. 3
Camden CM-221 Series Switches

Camden Door Controls is pleased to announce that, in response to soaring customer demand, it has expanded its range of ValueWave™ no-touch switches to include a narrow (slimline) version with manual override. This override button is designed to provide additional assurance that the request to exit switch will open a door, even if the no-touch sensor fails to operate. This new slimline switch also features a heavy gauge stainless steel faceplate, a red/green illuminated light ring, and is IP65 rated, making it ideal for indoor or outdoor use as part of an automatic door or access control system. ValueWave™ no-touch switches are designed for easy installation and trouble-free service in high traffic applications. In addition to this narrow version, the CM-221 & CM-222 Series switches are available in a range of other models with single and double gang heavy-gauge stainless steel faceplates and include illuminated light rings. 3
4K Video Decoder

3xLOGIC’s VH-DECODER-4K is perfect for use in organizations of all sizes in diverse vertical sectors such as retail, leisure and hospitality, education and commercial premises. 3

Clustering Attacks on Web Apps to Find the Real Story Behind the Headlines

PSA TEC 2024 Runs May 13-17 in Dallas

Hanwha Vision Announces Support for Genetec Security Center SaaS Unified Solution

National Monitoring Center Secures Updated TMA Installation Quality (IQ) Certification

ISC West 2024 Welcomed More Than 29,000 Attendees

Evolon Introduces AI-Powered Video Monitoring Service

Resideo to Acquire Snap One to Expand Presence in Smart Living Products and Distribution

Allegion US Features Interoperability, Mobile Credentials and More at ISC West