Dynamic Anomaly Detection Using Machine Learning

By Dr. Tim Stacey, Ph.D. / Adlumin, Inc.

User Behavior Analytics is an incredibly hot field right now – software engineers and cybersecurity experts alike have realized that the power of data science can be harnessed to comb through logs, analyze user events, and target activity that stands out from the crowd. Previously the gold standard for this process was manual, based on exhaustive queries against large databases. These investigations also happened ex post facto, after the hack or the intrusion occurred to diagnose what actually happened.

At Adlumin, we’ve sought to create a proactive product that reduces the amount of intensive data work that a cybersecurity specialist needs to perform. We’ve had analytics in production since inception, but today we’d like to introduce a new product that will make finding new malicious activity even easier.

Our new Rapid User Behavior Alerts will pick up on novel user behavior in a range of event types, specifically targeting combinations of attributes or actions that have never been seen before on a network. These Rapid Alerts come out within seconds of the Adlumin platform receiving the data, notifying sysadmins that something unexpected has occurred on their network.

Importantly, we have tuned our new data science engine to have high tolerances for power users (eg. sysadmins) while triggering at lower tolerances for users that have a limited range of behaviors. This is a crucial aspect to reduce over-flagging on novel behavior. Our goal is to transmit high impact findings reliably and quickly and avoid spamming the end user with bad alerts.

Our analytics engine takes advantage of an auto-encoding neural network framework, finding the difference between previous and current modes of user behavior in a heavily non-linear space. By passing the event through a trained auto-encoder, we determine the reconstruction error of an incoming event – this is a measure the anomalous nature of a user’s actions. Since the anomalous characteristics of the incoming event are condensed to a single number, we can grade this number against a distribution of the user’s previous events to determine if this incoming event is truly different.

Our fast evaluation of incoming data is made possible with the assistance of AWS DynamoDB and AWS Lambda. Pre-trained user models live in our Dynamo tables—these models are quickly queried for each event, as we process thousands or hundreds of thousands of events per second. Our Lambdas evaluate the incoming data against the queried baseline and produce a threat score with an interpretation of what caused the threat. Our baselines are updated frequently on a schedule to account for the relatively fast drift in user behavior over time.

In the coming months, Adlumin will be rolling out analytics specifically targeted to log data, system behavior, and a more detailed analysis dependent on cold storage of data. Rapid User Behavior Alerts are the first line of defense as we develop a suite of analytics to protect your network from harm.

Dr. Tim Stacey is the Director of Data Science for Adlumin Inc., a cybersecurity software firm based in Washington, DC. His work primarily focuses on user behavior analytics and his experience includes designing analytics for Caterpillar, the RAND Corporation, and the International Monetary Fund. He holds a PhD from the University of Wisconsin Madison in computational chemistry.