A Data Science Approach to Detecting Insider Security Threats

June 12, 2014 Derek Lin

In my conversations with CISOs, one of their biggest fears is insider threat attack. Employees must access internal information freely to be productive, yet ill-intentioned information access must be guarded. According to the Verizon 2014 Data Break Investigation Report, the percentage of attacks from internal actors doubled in 2013, showing a large increase for the second year running.

Most of security tools today focus on identifying malware-initiated attacks. Malware often leaves informational trails which present many detection techniques to identify a malware signature for blacklisting; for example, matching based on packet or payload signature matching. Insider threat attacks, however, are committed by internal employees with valid data access. Unlike with detecting malware, there is no ready user signature to rely on.

Most current commercial products address insider threat attacks by relying on role-based access control policies to properly assign the right levels of data access privilege to the right users. Certainly policies prevent outright disallowed data access, but they are useless in preventing policy abuse where an ill-intentioned user is allowed to access data in an inappropriate way. A new approach to insider attack detection is needed.

The key is to proactively monitor user activities and flag alerts for anomalous behavior before potentially serious damage occurs. We see many opportunities for Big Data Analytics to address the problem of identifying anomalous user-to-resource access activities. In this post, I’ll share with you one such possibility using a patent-pending approach that we have found successful in client engagements.

The Active Directory log is a data set that records user’s authentication status on various network devices. Enterprises typically store such data for over a long time. This is a rich data repository that we can leverage to mine user behaviors. For each and every user, we examine who has attempted access to what devices over a particular historical period, and establish a behavioral profiling model to capture the historical norm. Then, given the current period of user’s device access records, we can measure its deviation against the historical norm of the user and his peers. You can then flag the user for further investigation if the deviation is large.

Here’s a specific example. In this particular large enterprise environment, there are two billion rows of Active Directory log data over six months of user-to-resource authentication records, with over 200K users across 300K devices. For every user, we built a behavioral profiling model. A less sophisticated model would simply count the average number devices accessed in the past and flag alert if the current number of devices is found to be much larger than the past average. However, such a simple behavioral metric is bound to fail due to the difficulty of establishing a threshold, and the high number of false positives. To achieve high precision results with a low number of false positives, the model should consider changes of access frequency in both new devices and seen devices, and compare that to the behavior of the user’s peers over their devices in the same period.

Figure 1 shows the visualization of a typical user behavior pattern in its access over devices across time. Cell color indicates the frequency of access over a device on a week. This typifies the device access patterns of most of the enterprise users.

Figure 1: visualization of a normal user’s behavior

Screen Shot 2014-06-11 at 9.00.56 PM

In contrast to Figure 1, Figure 2 shows a user exhibiting an obviously anomalous device access pattern. Change in behavior that deviates from the norm is what the model would catch and flag accordingly.

Figure 2: visualization of an anomalous user’s behavior

Screen Shot 2014-06-11 at 9.01.09 PM

It’s important to note that due to the data volume, such work is not possible without leveraging Pivotal’s Massively Parallel Processing (MPP) technology, using either the Pivotal Greenplum Database, or the HAWQ SQL processing engine available through Pivotal HD, Apache Hadoop® distribution. In such an environment, both the modeling training and scoring codes run within the database, taking only a fraction of a second to execute.

A data science model that uses Active Directory log data to detect insider threat attacks can provide a baseline alerting system. Other informational sources can further enrich the context of the alert for forensic investigation. For example, meta user information from Human Resources or project staffing databases can provide additional insight into the flagged user’s activities. Asset information provides additional context around the devices the user accessed. We can further correlate an alert from this model with alerts from other security products, bolstering the signal strength and increasing confidence in the alert.

A security data lake powered by Pivotal HD has the computing power to carry out sophisticated data science work, while offering the ability to freely inject data sources for alert correlation. Coupled with modern security tools such as the RSA Security Analytics monitoring platform, we have a powerful set of capabilities to detect a broader set of threats, and operationalize the response and remediation more effectively and efficiently.

Security work remains a wide green field of opportunity for data science applications. In future posts I’ll share more applications and examples of our work.

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author

Biography

Extracting UIViews from UIViewControllers in Swift

On several recent iOS projects at Pivotal Labs, we’ve extracted the view property of a UIViewController and...

Why Is My NTP Server Costing $500/Year? Part 1

Our recent monthly Amazon AWS bills were much higher than normal—$40 [1] dollars higher than normal. What h...

A Data Science Approach to Detecting Insider Security Threats

About the Author

Previous

Next

A Data Science Approach to Detecting Insider Security Threats

About the Author

Previous

Next

Related content in this Stream

Following the xz supply chain attack blog, explore security and trust in open source with VMware Tanzu's secure container solutions and proactive measures.

VMware Tanzu empowers Netflix accelerates its service evolution and boosts the capabilities of its development teams. Tanzu helps to provide them with the platform to run on and scale.

Unveil regulatory compliance ease with VMware Tanzu Spring Runtime! Elevate audits, adhere to FIPS & NIST standards, benefit IT, DevOps, and Auditors.

Uncover open source risks and the 'Zero CVE' myth with insights on continuous lifecycle management. Discover how VMware Tanzu supports diverse projects effectively.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This blog provides a summary of VMware Tanzu CloudHealth news and product updates for the month of April, 2024

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.