No Protection From Bad Data

No Protection From Bad Data

Beware what the experts claim as noise and a security signal

Walking the expo floor at the most recent RSA conference, it was hard to miss how many companies were talking about big data. Maybe they used the word analytics, or perhaps they called it machine learning. The claims were all similar. Give me your tired, your poor, your huddled masses of log and incident data that you’re yearning to make actionable—or something along those lines. They all wanted to sell us on the notion that big data can take operational noise and turn it into security signal.

Of course, the next line of the famous poem I misquote refers to “wretched refuse,” and that’s a pretty good description for most of the data people would feed into these systems. The belief that such systems can take refuse and spin it into gold is magical thinking. Big data, data science, machine learning and other novel approaches do have true promise for security leaders and practitioners looking for better results. There are ways to get better data to feed these new systems if you know where to look. When you combine good data and big data, you get results that can seem like magic without having to wear the funny hats.

The need for good data starts right in the beginning of the big data lifecycle. The real power of these analytics systems are in the models they build. A good model will allow you to go from a reactive to a predictive approach to security incidents. It’s like having hindsight for the future—you’re always seeing the signs 20/20. But how can you tell if you’ve got a good model? The model needs to be tried out on a dataset you know a lot about.

This is the practice test where you have the answers and you want to see how well you may do on the real test later. Of course, a lot of the vendors have done the dirty work for you here. They have models that they feel should work well.

Best practice would be to take their model for a spin on your well known data and see what happens. Maybe the model needs some adjustments for the particular configuration of your IT infrastructure. People paying attention often get stuck right at this phase. Do you have a good backlog of data you could use to run tests like this? Are you archiving the critical logs and other data sources that these models would employ? Have you done the forensics to identify where the data you have corresponded to breaches or incidents you’ve experienced so you know what the models ought to turn up? Too often the answers to these questions are no, no and no.

Let’s say you get to the stage where you have models that will work. The next challenge on the path to big data nirvana will be having these analytics cough up the answers to your burning questions. Here’s where another problem in the data often emerges. You’ve set up your shiny new data science driven machine, but suddenly you find the questions you’re being asked and the data you have don’t line up.

It’s often a

level of abstraction mismatch. The executives ask questions about people, but your data is about machines and IP addresses. This can also affect your modeling. If you only have a hammer, you smash everything like a nail. Maybe you go find a bunch of data about people, but if you don’t adjust the models so that they treat people, machines, IP addresses and everything else like they ought to be treated and as if they are in the right relationships, then you’ve made the problem worse and not better. Often bad models with data at different levels of abstraction (e.g. a person and an IP) that fall into some correlation will start to crank out tons of bad conclusions when one thing does something normal for its level of abstraction (e.g. a person quits) that doesn’t have anything to do with the thing it fell into accidental correlation with.

An excellent example of good modeling for these disparate types of data in the security space is how Securonix has used peer grouping from Identity and Access Management (IAM) systems to enrich their data. Like many in the User Behavior Analytics (UBA) space, Securonix will look at lots of different data sources to get you the insight you want about what your users are up to. They could have been prone to the troubles of modeling with different layers of abstraction and correlating at the wrong levels, but seem to have gotten it right.

When they take in data about people and their organizational relationships from IAM systems, they use this to influence the outcomes not by directly correlating it with activity at the network and system layer, but rather by augmenting expectations of what is normal based on the peer group you may have. A peer group will be people in the same building, managers at the same level of the hierarchy, or employees with the same job role.

You would expect these people to have notably similar activity patterns, and you can use that to learn what they do and point out anomalies. That’s getting the analytics game right by pulling in the right data and using it well.

Sometimes you have to work with what you’ve got, and that means bad data will happen to you. Nothing serves as a better example for that than the notoriously poor quality logs on Microsoft platforms. Single events like a bad authentication appear as multiple entries in multiple events on multiple systems. Often, even if it’s possible to collect all the separate events, you find key pieces of data like an IP address is missing so correlating that event with other events from other sources is near impossible.

The default level of logging doesn’t offer enough detail for proper detection or forensics from a security perspective. Turning up the logging so that it becomes nominally useful for security means sacrificing a huge chunk of the system resources. To add insult to injury, collecting and parsing all these logs is immensely burdensome. If you’re a practitioner, then none of this is news to you. From the analytics perspective, these problems often hide behind the SIEM system. You don’t know you’re dealing with such low quality data until you start using it via the SIEM and getting poor results.

This is where you will likely need to look to other sources for good data if you want the quest for big data analytics to have a happy ending. The STEALTHbits StealthINTERCEPT platform is able to get all the security data you want from these Microsoft platforms and overcome these native logging issues. You can plug it into the SIEM, feed directly from Stealth- INTERCEPT itself, or even plug it into something else that can consume our SYSLOG output. This means adding another layer to your infrastructure, but the result is a high quality, real time stream of security events from your Microsoft systems. I’m picking on Microsoft, but it’s not like they’re the only one guilty of having bad logging. Luckily for many of these problem children you’ll find other solutions like ours that will help fill the gaps and get you the good data you want.

Moving from reactive controls to predictive controls has been a goal on the horizon for security organizations for a long time. Solid models powered by sound data science that processes the huge well of big data can finally make security predictive. That’s assuming they’re fed by streams of good data. If the stream of data is polluted, then all that water in the big data well isn’t worth the trouble to drink. You’re going to need good data to get these systems going, which means discipline and practices around retention you may need to improve.

You will have to be sure you’re getting all the right data from the right level sources and putting them in just the right relationships in your models so they produce useful results even as the real world changes around them. Sometimes you’re going to need to make tough choices to invest in better data sources or deal with lower quality results. But if you can get the data right, then big data will do right by you.

This article originally appeared in the August 2015 issue of Security Today.

Featured

  • Maximizing Your Security Budget This Year

    7 Ways You Can Secure a High-Traffic Commercial Security Gate  

    Your commercial security gate is one of your most powerful tools to keep thieves off your property. Without a security gate, your commercial perimeter security plan is all for nothing. Read Now

  • Surveillance Cameras Provide Peace of Mind for New Florida Homeowners

    Managing a large estate is never easy. Tack on 2 acres of property and keeping track of the comings and goings of family and visitors becomes nearly impossible. Needless to say, the new owner of a $10 million spec home in Florida was eager for a simple way to monitor and manage his 15,000-square-foot residence, 2,800-square-foot clubhouse and expansive outdoor areas. Read Now

  • Survey: 72% of CISOs Are Concerned Generative AI Solutions Could Result In Security Breach

    Metomic recently released its “2024 CISO Survey: Insights from the Security Leaders Keeping Critical Business Data Safe.” Metomic surveyed more than 400 Chief Information Security Officers (CISOs) from the U.S. and UK to gain deeper insights on the state of data security. The report includes survey findings on various cybersecurity issues, including security leaders’ top priorities and challenges, SaaS app usage across their organization, and biggest concerns with implementing generative AI solutions. Read Now

  • New Research Shows a Continuing Increase in Ransomware Victims

    GuidePoint Security recently announced the release of GuidePoint Research and Intelligence Team’s (GRIT) Q1 2024 Ransomware Report. In addition to revealing a nearly 20% year-over-year increase in the number of ransomware victims, the GRIT Q1 2024 Ransomware Report observes major shifts in the behavioral patterns of ransomware groups following law enforcement activity – including the continued targeting of previously “off-limits” organizations and industries, such as emergency hospitals. Read Now

Featured Cybersecurity

Webinars

New Products

  • HD2055 Modular Barricade

    Delta Scientific’s electric HD2055 modular shallow foundation barricade is tested to ASTM M50/P1 with negative penetration from the vehicle upon impact. With a shallow foundation of only 24 inches, the HD2055 can be installed without worrying about buried power lines and other below grade obstructions. The modular make-up of the barrier also allows you to cover wider roadways by adding additional modules to the system. The HD2055 boasts an Emergency Fast Operation of 1.5 seconds giving the guard ample time to deploy under a high threat situation. 3

  • Luma x20

    Luma x20

    Snap One has announced its popular Luma x20 family of surveillance products now offers even greater security and privacy for home and business owners across the globe by giving them full control over integrators’ system access to view live and recorded video. According to Snap One Product Manager Derek Webb, the new “customer handoff” feature provides enhanced user control after initial installation, allowing the owners to have total privacy while also making it easy to reinstate integrator access when maintenance or assistance is required. This new feature is now available to all Luma x20 users globally. “The Luma x20 family of surveillance solutions provides excellent image and audio capture, and with the new customer handoff feature, it now offers absolute privacy for camera feeds and recordings,” Webb said. “With notifications and integrator access controlled through the powerful OvrC remote system management platform, it’s easy for integrators to give their clients full control of their footage and then to get temporary access from the client for any troubleshooting needs.” 3

  • Camden CM-221 Series Switches

    Camden CM-221 Series Switches

    Camden Door Controls is pleased to announce that, in response to soaring customer demand, it has expanded its range of ValueWave™ no-touch switches to include a narrow (slimline) version with manual override. This override button is designed to provide additional assurance that the request to exit switch will open a door, even if the no-touch sensor fails to operate. This new slimline switch also features a heavy gauge stainless steel faceplate, a red/green illuminated light ring, and is IP65 rated, making it ideal for indoor or outdoor use as part of an automatic door or access control system. ValueWave™ no-touch switches are designed for easy installation and trouble-free service in high traffic applications. In addition to this narrow version, the CM-221 & CM-222 Series switches are available in a range of other models with single and double gang heavy-gauge stainless steel faceplates and include illuminated light rings. 3