Learn how Adlumin’s Threat Research Team is dissecting emerging threats and providing invaluable insights that empower organizations to proactively defend against ever-evolving cyberattacks.
Ransomware attacks are increasing by the day, and they’re wreaking havoc across a range of industries. Adlumin has launched the beginning of its Ransomware Protection Suite of products: The Ransomware Self-Assessment Tool (R-SAT). During the early days of COVID-19, which provided new opportunities for attackers, ransomware attacks surged. According to Statista, “ransomware attacks experienced annually by organizations have been on the rise since 2018, peaking at 68.5% in 2021.”
Ransomware is a type of malware designed to encrypt files on a device, making any files and systems that rely on them unusable. When a cybercriminal maliciously encrypts confidential files within an organization’s system, a subsequent monetary demand and payment must ensue before the perpetrator releases the information back to the organization.
R-SAT helps institutions, regardless of size, assess their level of information security, recognize gaps in that security and measure their ability to mitigate the possibility of a ransomware attack. Understanding the vulnerabilities in your institution’s security processes and procedures is imperative to aid in your protection from ransomware. R-SAT is a solid place to start to help identify gaps in your protection strategy and validate effective security practices.
To protect yourself from ransomware, it is critical to recognize the vulnerabilities in your security practices regardless of whether your data is held on-premise or third party. If your organization is victimized by ransomware, many questions may immediately come to mind: If you provide the money, are you certain the information will be released? Will the data be released to the public if you refuse to pay? R-SAT can assist and better prepare you to respond.
Adlumin looks to continue to add to its suite of Ransomware tools such as greater reporting, automated alerts, and more. Below are just a few cost and payment trends for ransomware:
“The total cost of a ransomware breach was an average of $4.62 million in 2021, not including a ransom.” (IBM)
“The average cost for education institutions to rectify the impacts of a ransomware attack, including the ransom itself, was $2.73 million in 2021 — 48% higher than the global average for all sectors.” (EdScoop)
“The 2,084 ransomware complaints received by the IC3 in the first half of 2021 amounted to over $16.8 million in losses.” (FBI and CISA)
Firewall, VPN, and network security device logs can provide insight into a wide variety of malicious activities, including remote code executions, port scanning, denial of service attacks, and network intrusions. However, analyzing these logs presents several challenges.
These data streams are high velocity; even small and medium networks can produce thousands of logs per second with significant fluctuations throughout the day. A network comprises several different devices manufactured by different companies, so manually defining rules for parsing log data is unfeasible and would break as devices are updated. Preventive measures against malicious activity must be taken quickly, meaning that we need an online process to deliver insights that are useful for more than forensics.
At Adlumin, we’ve designed around these challenges and built a fast, parallelizable system capable of handling new log formats and deployable on streaming data. We own a patent that covers this method for ascertaining whether log data is anomalous and indicative of threats, malfunctions, and IT operations failures
Malicious activity nearly always produces unusual log data, so our system focuses first on finding anomalous data. Focusing on anomalous logs drastically decreases the volume of data that must be analyzed for malicious activity without running the risk of ignoring pertinent data. In this discussion, we’ll focus on how we determine whether log data is anomalous, as doing so enables a wide variety of techniques for searching for malicious activity.
As an overview, our system trains itself on-device log data that we’ve recently received. Based on the data, the system builds a set of templates that correspond to commonly seen log content and structure. The system also computes a measure of how rare each template is and builds an anomalous scoring threshold. When new data is received, it’s checked against this template set and assigned a score based on the template that best matches the data. If new data that does not directly correspond to a template arrives, then a measure of the difference is added to the score. Data with a score above the anomalous scoring threshold is considered anomalous.
Training
We begin by training our system in a batch process on recently ingested log data. We create a set of templates by grouping similar logs together. Automated systems generate logs, so they contain a mix of parameters (e.g., an IP address, a timestamp, or a username) and fixed messages (e.g., “connection opened” or “user logged in”). Our system uses a token-based parser tree to associate logs that have the same fixed messages, but that may have differing parameters.
Before applying the parser tree, logs are preprocessed. Logs are tokenized according to common delimiters. Then, they are preprocessed by applying pre-defined regex patterns that replace well-known parameters (e.g., r’ [0-9a-fA-F]{12}’ for a 12-digit hexadecimal value) with specialized parameter tokens. Typically, any token that contains a number is also treated as a parameter and replaced with a parameter token. Once preprocessed, the parser tree can determine log associations.
Thus, the intuition is that logs said to be similar in the content will have similar sequences of messages and parameters that can be represented as tokens for easy comparison. The first layer of the parser tree is assigned based on the total number of tokens in the preprocessed log. This step is utilized because parameters are typically a fixed number of tokens, meaning this step quickly groups associated logs. Subsequent layers of the parser tree are fit according to the token in the nth position of the preprocessed log, where n comes from a pre-defined set of token positions. To prevent parser tree explosion, if the number of branches at any given layer exceeds a threshold, then a wild card token is used for all new tokens.
Once the bottom of the parser tree is reached, if there are no other logs already at this position of the parser tree, then the incoming log is used as a new log format. Otherwise, the incoming log is compared to the log formats at this position using token edit distance.
If the incoming log is sufficiently similar to an existing log format, then the log is associated with that format. In addition, the log format is updated such that any dissimilar tokens between the log format and the incoming log are replaced with wildcard tokens. If the incoming log is insufficiently similar to all existing log formats, then the incoming log is used as a new log format.
Once all training logs have been processed, all the log formats generated by the parser tree are returned. Particularly rare log formats are dropped. For each log format, we generate a log template which includes: a regular expression pattern capable of matching logs associated with the template, a set of the tokens contained in the log template, and a log template score based on the frequency that the log template appeared in the training set.
Another set of recent log data is used to generate a score threshold. Logs in this dataset are scored using the log templates. To score, an incoming log is preprocessed and checked to see if it matches any log templates’ regular expression pattern. If the incoming log matches a regular expression pattern, then it is assigned the score associated with the log template that it matches.
Otherwise, the incoming log’s tokens are compared to each of the log templates’ token sets using Jaccard similarity. The incoming log is associated with the log template that has the most similar token set and assigned a score based on the log template’s score and the similarity between the token sets. This matching process allows previously unseen log formats to be assigned appropriate rarity scores.
Once all the score threshold dataset has been processed, all scores assigned to logs are considered. Based on these scores, a global score threshold is determined using the percentile score and standard deviation of the set of all scores.
Inference
In contrast to the training process, inference occurs in a streaming environment and finds anomalous records in near real-time.
As in the score threshold process, incoming logs are preprocessed and either matched to log templates using regular expression patterns or token set similarity. The incoming log is assigned a score of the matched log template or the matched log template and a similarity measure, respectively. If the incoming log’s score is greater than the score threshold, then the log is considered anomalous.
To facilitate real-time analysis, we utilize AWS Lambda for conducting inference and a DynamoDB table for template storage. We’re able to spin up additional Lambda instances automatically and adjust DynamoDB read capacity as needs demand them.
Classifying logs and assigning rarity scores opens a wide variety of analysis opportunities: keyword lists that would otherwise generate numerous false positives are more effective when applied to the anomalous dataset. Log template associations enable time series analysis of specific log messages. By determining the location of parameters, specific information can be extracted and analyzed, even in unseen log formats. At Adlumin, we utilize a variety of these techniques to provide preventative alerts of specific threats, malfunctions, and IT operations failures.
Adlumin’s Cybersecurity Maturity Model Certification (CMMC) Assessment feature is a tool to help streamline an organization’s preparation for the U.S. DoD’s CMMC.
CMMC is a unified cybersecurity standard intended to guide DoD contractors in implementing the cybersecurity processes and practices associated with the achievement of a cybersecurity maturity level. CMMC maturity levels range from Level 1 to Level 5, and cybersecurity maturity is assessed across 17 cybersecurity domains.
The CMMC is designed to provide increased assurance to the Department that a contracting company can adequately protect sensitive, controlled unclassified information (CUI) and/ or federal contract information (FCI).
Adlumin’s CMMC Assessment feature is an easy-to-use self-assessment tool that gauges an organization’s progress towards achieving the appropriate target CMMC maturity level. The feature’s core functionality includes:
A dashboard providing a high-level overview of an organization’s current compliance level across all 17 CMMC domains based on the answers to a self-assessment.
Visualizations to easily identify gaps in an organization’s cybersecurity processes and practices that will prevent attainment of the target maturity level.
The ability to note and manage tasks, which are required to improve an organization’s cybersecurity maturity.
On-demand PDF reports that reflect the results of the self-assessment and report the current compliance level across each of the 17 CMMC cybersecurity domains.
Ransomware attacks work by encrypting critical data on a victim’s devices and network in exchange for payment, generally in cryptocurrency. These attacks have become increasingly prevalent and result in losses worth hundreds of millions of dollars every year, with attackers targeting critical infrastructure, government agencies, and financial institutions. Once perpetrators gain access to a victim’s network through any exploited lapse in security, they deploy malware that works to get the victim to pay the requested ransom. The full scope of the attack may often not be known until well after the attack has concluded and the attacker has been able to spread the malware across the network, affecting maximum damage to critical data.
Adlumin Data Science has developed a machine learning algorithm for detecting ransomware attacks via comprehensive monitoring of changes across the entire file system. The detection system uses an algorithm for measuring the number of access events, specifically monitoring the number of Write/WriteAttribute (Windows Event ID 4663) and Delete (Windows Event ID 4660) events. These access events help provide a clear footprint for encryption and deletion events occurring across the network, which may be indicative of a system-wide ransomware attack. The process is made possible through Adlumin’s serverless data pipeline in the cloud that allows the algorithm to collect and monitor file access events in near real-time. Traditional file auditing processes can be used to monitor these events; however, the volume of the files read/deleted on a network tends to overflow such systems.
The newly developed ransomware detection model monitors the volume of these three events independently of each other per user across the entire network, looking for anomalous spikes in aggregate activity during specific time windows using historical data as a benchmark. If the amount of activity (either write or deletion) exceeds a model-determined threshold relative to the rest of the activity on the network, a detection will be sent for investigation. This proactive monitoring may allow security analysts to quarantine and isolate portions of their network that are being hit by excessive encryption and deletion before the attacker is able to spread the attack to the rest of the system.
In addition to monitoring for excessive levels of file access events, the algorithm analyzes the distribution of objects modified and/or deleted across the network. If the majority of activity is focused in a single subdirectory, which could be associated with software installation or anti-virus scans, the model will not externally raise a detection. However, if the spike in activity is spread across multiple subdirectories, indicative of system-wide activity, the model will raise a detection.
Below is an example of a theoretical ransomware detection. The detection view allows Adlumin users to view the aggregate file access information that triggered the detection, as well as a sample of the individual files that were accessed allowing security analysts to determine where the activity is occurring. Analysts can further click on each file access event to view more detailed information surrounding the event.
One of the most powerful features that Adlumin provides is the ability to integrate with third-party devices and applications. By aggregating event logs from all your devices and applications, Adlumin delivers a single pane of glass for tracking anomalies, identifying vulnerabilities, and managing your overall network security.
Traditional integration methods, which often required cumbersome on-premises solutions, are no longer compatible with the way we work. Remote work and the ever-increasing role of SaaS in the enterprise requires that platform providers like Adlumin offer cloud-native solutions to integrate with external applications. Collecting and analyzing this data is not easy and becomes exponentially more challenging as the number of SaaS services increases. Adlumin aims to make this process as simple as possible.
The Adlumin platform has robust, native support for an ever-increasing list of cloud-based SaaS solutions, ranging from network and endpoint security solutions to office and collaboration suites. Adlumin collects data from the providers you select, parses that data into a native format, and then correlates the events across your existing data. This analysis and correlation make it simple to track incidents and events across multiple platforms while also alerting you in real-time to potential threats.
This past month, Adlumin launched two exciting new third-party API integrations: AWS and Google Workspace.
AWS is the world’s leading cloud platform, and now you can automatically collect, track and alert on any IAM user events directly from Adlumin.
Google Workspace is the premier cloud-based office and productivity suite; with Adlumin’s Google Workspace integration, audit logs are automatically ingested for all Workspace services.
Network and user activity from AWS and Google Workspace is automatically associated with existing data in the platform and can be easily searched and cross-referenced. Custom detections can be created for both AWS and Google Workspace to alert on any user-specified criteria that appear in the event logs.
Integrations are an essential part of the Adlumin platform. These latest additions add even more power and functionality to our core product. The rate of development for third-party integrations is ramping up, and in the coming months, we will be announcing support for several more high-profile SaaS offerings. Adlumin never stops working to give our customers the most significant event correlation and analysis platform on the market, with the features and functionality they demand.
Lateral movement is a type of cyberattack where the attacker first gains access to a single component in a network and then proliferates their reach across the network to steal data or execute other malicious actions. The initial and subsequent access to various systems is achieved via stolen credentials. The attacker maintains a low profile post the initial breach, moving between systems through authenticated access, biding their time until they encounter data or systems they have in mind as targets. These attacks are difficult to detect while in progress because the pattern of credential acquisition and access tends to vary, and they do not generate much by way of obviously suspicious network traffic. The losses can be heavy for the network under attack.
Adlumin’s Data Science team has developed a machine learning algorithm for lateral movement detection, using techniques adapted from network graph theory. When an attacker moves laterally on your network, they are likely to leave a trace of access events. Adlumin’s lateral movement detection model learns the normal patterns of access on your network and alerts you when a privileged user’s behavior deviates significantly from that baseline. Machines associated with anomalous behavior are flagged, and the associated user’s behavior for the day is summarized in the detection to help security analysts further investigate.
Adlumin’s approach is host-based simply because lateral movement concerns accesses involving multiple hosts. While the detections highlight individual hosts as anomalous, the objective is to draw attention to the access events themselves, which is facilitated by the user interface Adlumin provides with this and other detections.
Our data science team continues to explore various unsupervised techniques, such as embedding user-host information in vector space and subsequently using clustering methods to flag unusual access patterns. Adlumin’s current graph-based detection provides a robust baseline with a low false-positive count and a good performance against known adversarial logins, against which other approaches can be evaluated.
The critical requirement of the graph-based approach is to map out every user’s daily login behavior over several days. The user’s history is represented as a collection of daily login graphs where each daily graph has:
Vertices representing hosts and systems
Directed edges between the vertices for representing logins between those systems
Based on this construction, the following detection attributes are worth noting:
This algorithm only monitors accounts with privileged access since stolen privileged credentials can cause maximal damage.
Since a lateral attack is likely to involve machines that a given user rarely accesses, a detection is only triggered if the anomalous pattern of behavior involves rare or “novel” machines. The purpose is to avoid false positives.
Once the graph-theoretic metrics are extracted from each vertex in each graph, all vertices are represented individually as vectors in high-dimensional feature space. Vertices associated with outliers in this feature space are flagged as anomalous. Since each vertex represents a unique host, the anomaly score is ultimately assigned to individual machines. Machines with “high” anomaly scores relative to an appropriately set baseline are flagged as suspected lateral movement attack venues.
Below is an example of a lateral movement detection, displaying two distinct anomalous patterns of access. The graph provides a visual representation of the user’s flagged behavior, and the table to the right attempts to characterize essential features of that behavior. Users of the Adlumin application can click on the graph nodes for more detailed information to further investigate the individual access events, shown in the two tables below the graph.
Threat intelligence has become an essential part of the security landscape. The best solutions, like Adlumin, use machine learning to automate data collection and processing. They integrate with your existing third-party solutions, take in unstructured data from a variety of disparate feeds, and then provide context on indicators of compromise (IoCs) and the tactics, techniques, and procedures (TTPs) of threat actors. Good threat intelligence is actionable – it provides context, is timely, and provides decision-makers with insights about the threat at hand.
Over the past year, the engineering team at Adlumin has been working to strengthen our platform’s threat intelligence capabilities, and we recently launched an exciting new integration with CISA/DHS. Adlumin is now a participating member of CISA’s Automated Indicator Sharing (AIS) program. Threat intelligence feeds from AIS are pulled throughout the day. We are constantly scanning incoming and historical event data for indicators of compromise that we parse out of the feeds.
The AIS ecosystem empowers participants to share cyber threat indicators and defensive measures, such as information about attempted adversary compromises as they are being observed, helping protect other participants of the AIS community and ultimately limit the adversary’s use of an attack method. In the future, we will enable Adlumin customers to flag the IoCs they spot on their networks. Once a flagged indicator has been reviewed and confirmed, it will be submitted to AIS, where it will be shared with the community at large. More information about the AIS program is below.
Automated Indicator Sharing (AIS), a Cybersecurity and Infrastructure Security Agency (CISA) capability, enables the real-time exchange of machine-readable cyber threat indicators and defensive measures to help protect AIS community participants and ultimately reduce the prevalence of cyberattacks. The AIS community includes private sector entities; federal departments and agencies; state, local, tribal, and territorial (SLTT) governments; information sharing and analysis centers (ISACs), and information sharing and analysis organizations (ISAOs); and foreign partners and companies.
AIS is offered as part of CISA’s mission to work with our public and private sector partners to identify and help mitigate cyber threats through information sharing and provide technical assistance, upon request, that helps prevent, detect, and respond to incidents. The AIS ecosystem empowers participants to share cyber threat indicators and defensive measures, such as information about attempted adversary compromises as they are being observed, helping protect other participants of the AIS community and ultimately limit the adversary’s use of an attack method.