The journey to becoming a modern software company involves state-of-the-art software methodologies, cloud platforms, and data and analytics software. However, it’s important to align this transformation to a high-value business problem or opportunity. By starting with a well-defined, impactful use case, rather than starting with a particular technology, enterprises can reduce the risk of following a dead end and increase return on investment. One high-value area that’s especially productive to start with is cybersecurity.
It’s never been a harder time to be a security professional. Malware and exploits once held only by national intelligence services are increasingly available to miscreants for malicious activity. For all of the investment in firewalls, security information and event managers (SIEMs), and intrusion detection systems, the ongoing success of phishing and other user-focused infiltration tactics all but guarantee that breaches will occur. And not all of these breaches will originate outside the perimeter -- a persistent, sizable minority will still come from true insiders.
If breaches are inevitable, then the goal is to quickly identify and remove attackers before they compromise highly valuable, sensitive assets. Doing this entails rapid identification and classification of legitimate versus illegitimate behaviors, while minimizing false positives. To better understand anomalous user behaviors, enterprises are increasingly complementing their security portfolio with advanced data science methods to find signs of unauthorized activity. Uncommon relationships between users and critical servers, access patterns that deviate from colleagues in a similar role, and users who travel unusual paths among systems and applications can signal the presence of intruders or malicious insiders. Advanced analytics for lateral movement detection can help find these latent signals.
Finding Intruders with Lateral Movement Detection
Lateral movement refers to the various techniques attackers use to progressively spread through a network as they search for key assets and data, harvest privileged credentials, exfiltrate data, and leave a persistent backdoor for ongoing access.
So why is analyzing lateral movement so important? Because it’s one of the times when attackers must emit “noise” that indicates their presence. One of an attacker’s goals is to remain undetected while traversing a target network. For example, attackers who can control a communications channel between two or more nodes have a number of ways to disguise their traffic, such as encryption. But when an attacker has control of only one node, scanning and probing for other vulnerable machines takes place in the clear. So lateral movement gives security professionals a chance to identify attackers who have bypassed perimeter defenses and compromised a host.
Why Perimeter Defenses Alone Fall Short
Most organizations understand the need to protect their network infrastructure from external threats with devices like security information and event managers (SIEMs) and intrusion detection systems (IDSs). SIEMs analyze different types of log data to identify, correlate, and score threat events according to relatively simple rules. IDSs look for telltale signatures of an exploit passing across the network. User behavior analytics solutions also add file or application access to their rulesets (but without any higher-level data like user rights or entitlements). These solutions have important properties for compliance, such as deterministic behavior -- you can understand or predict why a rule will activate from inspection of a particular packet or log event. However, they’re less effective when confronted with novel exploits or attackers who are skilled at evading signature detection. They also generate a large number of time-wasting false positive alerts.
Using Advanced Analytics for Lateral Movement Detection
Advanced analytics can reveal patterns of activity suggesting lateral movement -- relationships, clusters, profiles, and paths of traversal -- that complement the event correlations and alerts provided by SIEMs. Some of the techniques that are useful in analyzing lateral movement include:
Graph analytics. These are useful to assess the relationship between the users and the resources they are accessing, especially if the resources are labeled with a risk score. With graph-theoretic techniques, analysts can answer questions such as, are links between users and systems normal, or are there suspicious relationships that signal a problem?
Clustering. Using data about user roles, entitlements, and activity, analysts can cluster users into behavioral cohorts. For example, we’d expect people in the accounts receivable group to access certain systems, but probably not the wire transfer system. With clustering, analysts can assess if a user’s behavior deviates significantly from others in the same role.
Path analytics. Legitimate users tend to access systems and applications according to similar patterns over time. Path analytics can highlight inconsistencies in these user access patterns over time. Also, by understanding the previous path of an attacker – where he/she previously traveled – and by risk-scoring high-value targets, security professionals can start to predict the attacker’s likely future path.
Analytic data warehouses that support in-database data science functions and massively parallel processing (MPP) greatly simplify and accelerate the deployment of advanced analytics. These have the ability to integrate log data with other sources of structured, semi-structured, and unstructured data, such as text, geospatial, and graph data. With a scale-out MPP data warehouse, analysts can parallelize the necessary computations across an entire cluster. This is especially useful for building per-user models across thousands of enterprise users, in order to create individual user anomaly scores for prioritizing investigations. Analysts can also model, train, and deploy analytics all inside the same environment, without the need for data movement between the data management system and a separate analytics environment.
Here’s an example from an earlier Pivotal blog post (“A Data Science Approach to Detecting Insider Security Threats”). In this case, a customer engaged Pivotal data scientists to analyze two billion rows of Active Directory log data over six months of user-to-resource authentication records. These records were generated by more than 200,000 users across 300,000 devices. For each user, the data scientists built a profile to identify patterns of anomalous behavior (the X axis indicates the time period, while the Y axis indicates the particular resource being accessed. Brighter colors indicate a higher intensity of access).
Figure 2 indicates a normal access pattern:
Figure 3 indicates an outlier from normal behavior:
The problem behavior is immediately obvious. Using techniques like these, we can develop indicators to alert to changes in behavior, not just whether a particular network threshold was crossed. This help security professionals to both rapidly identify suspicious behavior and cull false positives from network alerts.
In another example, a Pivotal Greenplum Database customer needed a rapid way to investigate suspicious activity in a network of 30,000 nodes. With assistance from Pivotal data science professionals, they were able to use in-database analytics to achieve better speed and accuracy in detecting anomalous behavior than with their legacy SIEM and rules-based log analytics. Moreover, they identified shared workstations violating regulatory compliance, and they also detected previously unknown cases of data theft.
Finding long-term, quiet intrusions involving lateral movement requires creative analysis of anomalies obscured at the network level. This is especially so when hostile actors are in control of legitimate user accounts and are attempting to disguise their movement. Finding these types of threats requires advanced analytics and data management systems that go beyond the capabilities of ordinary perimeter defenses.
Knowing how to quickly create advanced analytical models and deploy them efficiently is the true calling card of the data-driven enterprise. Learn more about Pivotal’s approach to turning data into actionable insights. Pivotal’s Data Science, Pivotal Greenplum Database, and the Apache MADlib (incubating) pages are excellent starting points.
For a deeper dive, join us for the Pivotal Analytics Innovation Roadshow, a complimentary, day-long event in select cities across the United States and Europe (it also includes a hands-on afternoon data science workshop). Register and begin your analytics journey today!
About the Author
Bob is Principal Product Marketing Manager for analytics at Pivotal. He writes about the application of analytics, machine learning, and artificial intelligence to areas such as telecommunications, cybersecurity, and the Internet of Things for the benefit of Pivotal’s customers and internal stakeholders. Prior to joining Pivotal, Bob was part of the Data and Analytics team at Cisco Systems, where he was primarily focused on analytics for network operations and mobility. He holds a B.A. in Economics from Claremont McKenna College and M.B.A. from the University of California, Berkeley.More Content by Bob Glithero