Log Parsing for Network Security Threat Detection
Firewall, VPN, and network security device logs can provide insight into a wide variety of malicious activities, including remote code executions, port scanning, denial of service attacks, and network intrusions. However, analyzing these logs presents several challenges.
These data streams are high velocity; even small and medium networks can produce thousands of logs per second with significant fluctuations throughout the day. A network comprises several different devices manufactured by different companies, so manually defining rules for parsing log data is unfeasible and would break as devices are updated. Preventive measures against malicious activity must be taken quickly, meaning that we need an online process to deliver insights that are useful for more than forensics.
At Adlumin, we’ve designed around these challenges and built a fast, parallelizable system capable of handling new log formats and deployable on streaming data. We own a patent that covers this method for ascertaining whether log data is anomalous and indicative of threats, malfunctions, and IT operations failures
Malicious activity nearly always produces unusual log data, so our system focuses first on finding anomalous data. Focusing on anomalous logs drastically decreases the volume of data that must be analyzed for malicious activity without running the risk of ignoring pertinent data. In this discussion, we’ll focus on how we determine whether log data is anomalous, as doing so enables a wide variety of techniques for searching for malicious activity.
As an overview, our system trains itself on-device log data that we’ve recently received. Based on the data, the system builds a set of templates that correspond to commonly seen log content and structure. The system also computes a measure of how rare each template is and builds an anomalous scoring threshold. When new data is received, it’s checked against this template set and assigned a score based on the template that best matches the data. If new data that does not directly correspond to a template arrives, then a measure of the difference is added to the score. Data with a score above the anomalous scoring threshold is considered anomalous.
Training
We begin by training our system in a batch process on recently ingested log data. We create a set of templates by grouping similar logs together. Automated systems generate logs, so they contain a mix of parameters (e.g., an IP address, a timestamp, or a username) and fixed messages (e.g., “connection opened” or “user logged in”). Our system uses a token-based parser tree to associate logs that have the same fixed messages, but that may have differing parameters.
Before applying the parser tree, logs are preprocessed. Logs are tokenized according to common delimiters. Then, they are preprocessed by applying pre-defined regex patterns that replace well-known parameters (e.g., r’ [0-9a-fA-F]{12}’ for a 12-digit hexadecimal value) with specialized parameter tokens. Typically, any token that contains a number is also treated as a parameter and replaced with a parameter token. Once preprocessed, the parser tree can determine log associations.
Thus, the intuition is that logs said to be similar in the content will have similar sequences of messages and parameters that can be represented as tokens for easy comparison. The first layer of the parser tree is assigned based on the total number of tokens in the preprocessed log. This step is utilized because parameters are typically a fixed number of tokens, meaning this step quickly groups associated logs. Subsequent layers of the parser tree are fit according to the token in the nth position of the preprocessed log, where n comes from a pre-defined set of token positions. To prevent parser tree explosion, if the number of branches at any given layer exceeds a threshold, then a wild card token is used for all new tokens.
Once the bottom of the parser tree is reached, if there are no other logs already at this position of the parser tree, then the incoming log is used as a new log format. Otherwise, the incoming log is compared to the log formats at this position using token edit distance.
If the incoming log is sufficiently similar to an existing log format, then the log is associated with that format. In addition, the log format is updated such that any dissimilar tokens between the log format and the incoming log are replaced with wildcard tokens. If the incoming log is insufficiently similar to all existing log formats, then the incoming log is used as a new log format.
Once all training logs have been processed, all the log formats generated by the parser tree are returned. Particularly rare log formats are dropped. For each log format, we generate a log template which includes: a regular expression pattern capable of matching logs associated with the template, a set of the tokens contained in the log template, and a log template score based on the frequency that the log template appeared in the training set.
Another set of recent log data is used to generate a score threshold. Logs in this dataset are scored using the log templates. To score, an incoming log is preprocessed and checked to see if it matches any log templates’ regular expression pattern. If the incoming log matches a regular expression pattern, then it is assigned the score associated with the log template that it matches.
Otherwise, the incoming log’s tokens are compared to each of the log templates’ token sets using Jaccard similarity. The incoming log is associated with the log template that has the most similar token set and assigned a score based on the log template’s score and the similarity between the token sets. This matching process allows previously unseen log formats to be assigned appropriate rarity scores.
Once all the score threshold dataset has been processed, all scores assigned to logs are considered. Based on these scores, a global score threshold is determined using the percentile score and standard deviation of the set of all scores.
Inference
In contrast to the training process, inference occurs in a streaming environment and finds anomalous records in near real-time.
As in the score threshold process, incoming logs are preprocessed and either matched to log templates using regular expression patterns or token set similarity. The incoming log is assigned a score of the matched log template or the matched log template and a similarity measure, respectively. If the incoming log’s score is greater than the score threshold, then the log is considered anomalous.
To facilitate real-time analysis, we utilize AWS Lambda for conducting inference and a DynamoDB table for template storage. We’re able to spin up additional Lambda instances automatically and adjust DynamoDB read capacity as needs demand them.
Classifying logs and assigning rarity scores opens a wide variety of analysis opportunities: keyword lists that would otherwise generate numerous false positives are more effective when applied to the anomalous dataset. Log template associations enable time series analysis of specific log messages. By determining the location of parameters, specific information can be extracted and analyzed, even in unseen log formats. At Adlumin, we utilize a variety of these techniques to provide preventative alerts of specific threats, malfunctions, and IT operations failures.