Graph Mining For Detecting Coordinated Network Attacks

December 3, 2015 Anirudh Kondaveeti

Joint work by Jin Yu and Anirudh Kondaveeti

In previous blog posts, we have discussed how data science can be used to detect network security threats, such as using sequential pattern mining to detect watering hole attacks. Sequential pattern mining is shown to be effective in revealing “chains of attacks” in a network security breach. This blog post introduces the use of graph mining techniques to discover suspicious domain interaction structures, which may not necessarily be captured in sequential patterns. For example, a small group of domains that only interact with themselves but not others outside their “social networks” can be part of a botnet. Specifically, we are interested in detecting highly connected domain social networks to flag potential coordinated attacks such as watering hole attacks and also small isolated networks that resemble a botnet-like structure.

Fig. 1 shows the high-level workflow for mining domain social networks. The first two steps involve the application of the familiar association rule mining technique to establish pairwise correlations between domains. Two domains are linked if the conditional probability of a user visiting one domain given the fact that the user also visits the other domain within a short proximity of time is greater than a certain threshold. The domain social graph is built from such pairwise correlations (Step 3) and mined for specific structures (Step 4). The entire workflow has been implemented in Pivotal Greenplum Database and tested on proxy logs from a large enterprise network with promising results.

Fig. 1 High-level workflow for mining domain social networks

Pairwise Domain Associations

Association rule mining is a data mining technique widely used in areas such as market basket analysis. It finds rules of the form “if A, then B” (denoted A => B). For example, “if buy chips and beer, then also buy diaper.” A = {chips, beer} and B = {diaper} are so called itemsets, containing items purchased in a single transaction. Unlike sequential patterns, association rules do not imply sequential orders. As a preparation step for mining domain association rules, domains visited by the same user in a certain time window are collected to form a “basket”, analogous to items purchased in a single transaction as in market basket analysis. To that end, a sliding window is run through the proxy logs to collect distinct domains. As an example, the domains in sliding window 1 in Fig. 2: {googlevideo.com, twitter.com, youtube.com, doubleclick.net, google.com}, form a basket. The example uses a 60s window. One can set the time interval of the sliding window to a different value depending on the problem in hand.

Fig. 2 A sliding window (60s-window in the example) is run through the data to form “baskets”.

Given domain-to-basket assignments, pairwise association rule mining mainly involves the evaluation of support and confidence of each rule Domain A => Domain B:

Ÿ Support: Frequency of A and B in a common basket.
Ÿ Confidence: Conditional probability P(B|A) of seeing B in the same basket given that A is also present.

Table 1 lists example pairwise domain association rules. High confidence rules (i.e., rules with high P(A|B) or P(B|A)) that involve multiple users over several days—for example the highlighted rule—are generally more interesting than one-off or single-user event.

Table 1 Example pairwise domain association rules.

Exploring Interactions Between Domains

Building upon association rule mining, we adopt a graph-based approach to explore the interactions between domains. An undirected domain correlation graph is built from pairwise domain association rules. Fig. 3 shows an example correlation graph. Each node represents a domain. An edge connects two domains A and B if the max confidence of the rule A => B and B => A exceeds a user-specified threshold. In the example the max confidence associated with each edge is greater than 0.2. One can also impose a condition on domain co-occurrence frequency (support) to focus on rare (low support) domain associations.

Fig. 3 Example domain correlation graph.

Based on the study of social network structures of known attacks, a tightly connected network such as the one shown in Fig. 3 could indicate coordinated watering hole or botnet attacks. The question arises, “How to quantify the connectivity of a network?”

We take the OddBall approach to measure network connectivity. The first step is to identify a domain’s one-step neighborhood (also called ego-net). Two graph features are then extracted from the ego-net:

Ÿ N: Number of one-step neighbors
Ÿ E: Number of edges in the ego-net

N and E follow a power law:

E ∝ N^α, 1 ≤ α ≤ 2.

Fig. 4 plots N vs. E in a log-log scale for different ego-nets. Slope = 1, i.e., E = N, indicates a star-like network, while slope = 2 indicates a fully connected network, a clique. To approximate the slope, we calculate the OddBall ratio: log(E)/log(N). Given the same number of neighbors, the higher the ratio the higher degree of network connectivity. In practice, domains with an OddBall ratio of greater than, say 1.5, can be given a higher priority for investigation.

Additionally, one can also compute the clique percentage, the ratio between E and the number of edges needed to form a clique: E/((N²+N)/2). Clique percentage of close to 100% indicates a highly connected network.

Fig. 4 Number of neighbors vs. number of edges in log-log scale. Slope = 1 indicates a star-like network; slope = 2 indicates a fully connected network, a clique (Picture Source: ICDM’12 tutorial on graph anomaly detection).

Fig. 5 provides an example of a domain’s (central orange node) fully connected social network. The one-step neighbors of the central node are fully connected to each other, resulting in an OddBall ratio of log(10)/log(4) =1.66 and a clique percentage of 100%. Evidence from known attacks indicates that highly connected social networks as the one in Fig. 5 can potentially contain malicious domains. In addition, if some domains in the network were already known malicious, then the ego-net would help forensic export to easily identify their associates.

Fig. 5 Example of a domain’s (orange node) fully connected social network.

Detecting Isolated Clusters

Given a domain correlation graph (cf. Fig. 3), one can also identify isolated groups of domains that only interact with domains in the same group but not others. This resembles a botnet-like structure. Detecting such isolated groups can be formulated as the task of finding connected components (CCs) in a graph. Domains in the example CC shown in Fig. 6 do not interact with the rest of the community. From the network structures of known attacks, the observation is that malicious sites tend to exist in small CCs.

Fig. 6 Malicious sites tend to exist in small CCs.

Operationalization

Fig. 7 provides an operationalization vision for the analytics approaches introduced in this blog. The models are expected to be used by the security team, refined based on the feedback from the field, and updated periodically using new data loaded into the database.

Fig. 7 Operationalization vision.

In this blog post, we have introduced graph mining techniques for detecting network security threats. The approach builds domain social networks from pairwise domain association rules, and identifies suspicious domain interaction structures by leveraging well-established graph mining techniques. The approach has been applied to proxy logs from a large enterprise network, and flagged malicious domains that were previously unknown. Compared to the sequential pattern mining technique introduced in a related blog post, the graph-based approach focuses on analyzing domain social network structures, while sequential pattern mining more on capturing chains of attacks. The two approaches reveal different aspects of a potential attack, and can be used in conjunction to complement each other.

Read more:

Learn more about Pivotal Greenplum Database
Learn more about Pivotal Data Science
Read more about data science-driven security techniques

About the Author

Anirudh Kondaveeti is a Principal Data Scientist and the lead for Security Data Science at Pivotal. Prior to joining Pivotal, he received his B.Tech from Indian Institute of Technology (IIT) Madras and Ph.D. from Arizona State University (ASU) specializing in the area of machine learning and spatio-temporal data mining. He has developed statistical models and machine learning algorithms to detect insider and external threats and "needle-in-hay-stack" anomalies in machine generated network data for leading industries.

Pivotal Data Roadshow: A View From the Road

The response to the Pivotal Data Roadshow, held in a number of cities across North America this month, has ...

Mini Refactoring Pair Programming Backlog

Here at Pivotal Labs, the engineers practice Extreme Programming, which means our teams write automated tes...

Graph Mining For Detecting Coordinated Network Attacks

Pairwise Domain Associations

Exploring Interactions Between Domains

Detecting Isolated Clusters

Operationalization

About the Author

Previous

Next

Graph Mining For Detecting Coordinated Network Attacks

Pairwise Domain Associations

Exploring Interactions Between Domains

Detecting Isolated Clusters

Operationalization

About the Author

Previous

Next

Related content in this Stream

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.