See How To Stop Successful Phishing Trips

Phishing alert: One in 61 emails in your inbox now contains a malicious link

Be careful when you click. That email might not be as innocent as it looks.

By Danny Palmer |  |

The number of phishing attacks is on the rise, more than doubling in recent months, with one in 61 emails delivered to corporate inboxes found to contain a malicious URL.

Analysis by security provider Mimecast found that between August to November and December to February, the number of emails delivered despite featuring a malicious URL increased by 126 percent.  These malicious links are one of the key methods cyber criminals use to conduct criminal campaigns: by distributing phishing emails which encourage users to click through to a link.

The emails are often designed to look like they come from legitimate senders — like a company, or a colleague — in order to gain the trust of the victim, before duping them into clicking the malicious link.

The purpose of the malicious URL could be to deploy malware onto the PC or it could encourage the victim to enter sensitive information into a fake version of a real service — like a retailer, a bank or an email provider — in order to trick the user into giving up passwords and other data.  Attackers then either use this as a jumping off point for further attacks, or they look to sell it to other cyber criminals on underground forums.

In total, Mimecast analysed 28,407,664 emails delivered into corporate inboxes which were deemed “safe” by security systems and found that 463,546 contained malicious URLs — the figure represents an average of one malicious email getting through for every 61 emails that arrive.

Given the sheer number of emails sent back and forth by employees every single day, that represents a significant security risk and a potential gateway for hackers looking to conduct malicious activity.

“Email and the web are natural complements when it comes to the infiltration of an organization. Email delivers believable content and easily clickable URLs, which then can lead unintended victims to malicious web sites,” said Matthew Gardiner, cybersecurity strategist at Mimecast.

“Cyber criminals are constantly looking for new ways to evade detection, often turning to easier methods like social engineering to gain intel on a person or pulling images from the internet to help ‘legitimize’ their impersonation attempts to gain credentials or information from unsuspecting users,” he added.

ABCs of UEBA: B is for Behavior

by Jane Grafton on February 4, 2019

We like to say, “You can steal an identity, but you can’t steal behavior.” You might compromise my credentials, but you don’t know what time I normally login, the applications I typically use, the people I regularly email, etc.

Behavior is the Leading Threat Indicator
The key to predicting threats, especially unknown threats, is to monitor user and entity behavior – to recognize when that behavior starts being anomalous. Let’s take a serious example: workplace violence. You hear it over an over again after a violent incident – people close to the perpetrator say things like, “he was acting strange” or “he was keeping to himself” or “he was obsessed with social media” before he committed the violent act. There are always signs, and they are always behavior based. If you can get ahead of the threat, if you can predict it may occur, you can likely prevent it from happening. This is the premise of User and Entity Behavior Analytics (UEBA). Think about your own behavior, specifically in terms of patterns. Do you get to work at around the same time every day? Probably. If not, you likely have reasons. Maybe you have a doctor’s appointment. Maybe on Thursdays you have a standing appointment. When do you go to lunch? When do you leave for the day? People around you will notice if your behavior changes. If you start coming in late, if your lunches drag on, if you leave work early – any change in your behavior is noticeable. So, how does this same notion translate into UEBA and threat prediction?
If your office parking garage or building requires badge access, you’re creating an audit trail every time you swipe your badge. The machine learning models that power UEBA are able to detect changes in arrival and departure times, duration spent at the office or at lunch, even bathroom breaks if your office is secured by a keycard entry system. Further, if you use a keycard to enter your office, then login from a remote location with an unrecognized IP address, UEBA links those activities and flags that as an anomaly. You can’t possibly be in the office and working remotely at the same time. Linking user behavior data from the physical badging system and the Windows security log is the only way to ascertain this particular abnormality which is why the best UEBA products ingest the broadest variety of data feeds. Multiply this example by 1000s of employees and millions of transactions over time and you start to get a sense of the power of UEBA.

To predict unknown threats, UEBA examines everything users and entities are doing in real-time, then aggregates, correlates, and links that data to identify anomalies. Keep in mind an entire library of machine learning algorithms and analytics are applied against this combined and normalized data because it’s not possible for humans to detect changes in behavior patterns at this scale.

5 critical capabilities for 2019


We can add NASA to the list of recent federal cyber breach victims. The space agency disclosed in late December that hackers found their way into its servers in October 2018. While NASA is still investigating the extent of the breach, the agency knows the hackers accessed personal data of both former and current employees. Unfortunately, other agencies will surely find themselves in NASA’s shoes in 2019. Here’s why:

Cyber criminals know government IT pros have limited budgets that create resource challenges when it comes to securing a daunting array of technologies and data flows. This makes agencies at all levels of government target-rich environments for hackers. So, what’s the answer? How can government IT leaders take control of their data and reduce their vulnerability to bad actors in 2019?

The solution is straightforward, but multilayered. Government agency CIOs and CTOs need a hub-and-spoke system to collect and index data from all their IT touchpoints. These include network traffic, web servers, VPNs, firewalls, applications, hypervisors, GPS systems and pre-existing structured databases. For optimal cyber protection, all those data feeds should be run through an artificial-intelligence-authored security information and event management (SIEM) system equipped with machine-learning-powered analytics to identify anomalous and malicious patterns.

The hub-and-spoke approach should enable four critical capabilities: log/device management, analytics, account/system context and visualization of user privileges across an entire network. Here’s a walk-through of the capabilities and why they matter.

1. Log/device management: This piece should include unlimited and automated coverage of logs, devices and systems as well as integrated compliance management. It should also provide real-time event log management, Windows and Linux server management, cloud and on-premise ingest, secure and encrypted log management and log data normalization.

2. Analytics: Data today is too voluminous for human analysis, so using AI and machine learning to analyze large amounts of data makes the most sense. Agencies should look for a single platform that provides automated threat intelligence, real-time intrusion detection alerts, 24/7 network vulnerability assessment, and user and device context.

3. Account/system context: Speed is essential, so agencies should look for a system that provides one-click, automated risk reporting for auditors and decision-makers that takes minutes rather than days.

4. Visualized permissions: Because cybersecurity conditions and requirements quickly change, agencies need the ability to visualize privileged users and groups in real-time across the network in order to understand who can touch an agency’s data.

5. Long-term viability: Will an agency’s technology still be viable in one, two or five years? It’s an important question, but one that is often mistakenly answered with a yes. The era of on-premise architectures is over because they are flawed by design. Tied to the constraints of initial deployment, these systems are allergic to architecture migration, software redesign, advancements in analytic capabilities and new database implementation. In the cloud, however, organizations can develop a symbiotic relationship between the service they use and new cutting-edge technologies. With today’s cybersecurity threats, agencies need to be bigger, faster and stronger than the adversary, and the cloud gives them the opportunity to deploy the best solutions available.

The hub-and-spoke approach gives government agencies a fighting chance to keep data out of hackers’ hands. What used to be nice to have is now essential. There’s just too much at stake.

About the Author

Robert Johnston is the co-founder and CEO at Adlumin Inc.

Dynamic Anomaly Detection Using Machine Learning

by:  Dr. Tim Stacey, Ph.D.

User Behavior Analytics is an incredibly hot field right now – software engineers and cybersecurity experts alike have realized that the power of data science can be harnessed to comb through logs, analyze user events, and target activity that stands out from the crowd. Previously the gold standard for this process was manual, based on exhaustive queries against large databases. These investigations also happened ex post facto, after the hack or the intrusion occurred to diagnose what actually happened.

At Adlumin, we’ve sought to create a proactive product that reduces the amount of intensive data work that a cybersecurity specialist needs to perform. We’ve had analytics in production since inception, but today we’d like to introduce a new product that will make finding new malicious activity even easier.

Our new Rapid User Behavior Alerts will pick up on novel user behavior in a range of event types, specifically targeting combinations of attributes or actions that have never been seen before on a network. These Rapid Alerts come out within seconds of the Adlumin platform receiving the data, notifying sysadmins that something unexpected has occurred on their network.

Importantly, we have tuned our new data science engine to have high tolerances for power users (eg. sysadmins) while triggering at lower tolerances for users that have a limited range of behaviors. This is a crucial aspect to reduce over-flagging on novel behavior. Our goal is to transmit high impact findings reliably and quickly and avoid spamming the end user with bad alerts.

Our analytics engine takes advantage of an auto-encoding neural network framework, finding the difference between previous and current modes of user behavior in a heavily non-linear space. By passing the event through a trained auto-encoder, we determine the reconstruction error of an incoming event – this is a measure the anomalous nature of a user’s actions. Since the anomalous characteristics of the incoming event are condensed to a single number, we can grade this number against a distribution of the user’s previous events to determine if this incoming event is truly different.

Our fast evaluation of incoming data is made possible with the assistance of AWS DynamoDB and AWS Lambda. Pre-trained user models live in our Dynamo tables—these models are quickly queried for each event, as we process thousands or hundreds of thousands of events per second. Our Lambdas evaluate the incoming data against the queried baseline and produce a threat score with an interpretation of what caused the threat. Our baselines are updated frequently on a schedule to account for the relatively fast drift in user behavior over time.

In the coming months, Adlumin will be rolling out analytics specifically targeted to log data, system behavior, and a more detailed analysis dependent on cold storage of data. Rapid User Behavior Alerts are the first line of defense as we develop a suite of analytics to protect your network from harm.

BIO: Dr. Tim Stacey is the Director of Data Science for Adlumin Inc, a cybersecurity software firm based in Arlington, VA. At Adlumin, his work primarily focuses on user behavior analytics. His experience includes designing analytics for Caterpillar, the RAND Corporation, and the International Monetary Fund. He holds a PhD from the University of Wisconsin Madison in computational chemistry.


Running Linux Systems in the Enterprise is Just Good Business

By Milind Gangwani

The prevalence for running Linux systems in the enterprise is increasing, typically depended on for running business operations software, web applications, cloud technology, internet of things devices, and core banking software. However, these systems are often the most neglected in security. Furthermore, custom Linux variants pose a significant issue for many customers. Custom Linux variants can respond very differently than traditional core Linux variants like Ubuntu, Red Hat, and Fedora; because of the high variability across kernel and builds the IT staff often leaves these systems unattended despite their crucial role in IT security-based Infrastructure.

As the preferred operating system for cloud deployments, Adlumin typically sees configurations with either Debian and Fedora as the variant of choice. Even container technology, like Docker, utilizes a Linux Kernel.

While Linux, as an operating system, is used to manage applications, conduct computing, and process large volumes of data, organizations struggle to easily monitor and analyze activity transactions.

Here at Adlumin, we have designed and developed a Linux daemon that feeds into our cloud-native SIEM (Security Information & Event Management) technology. Our Linux forwarder can be installed on any of the version of Debian or Fedora. To state it more accurately, it can be deployed on Fedora version 6.0 and Debian version 6.0 onwards, till the latest release. These base builds include Linux variants like Red Hat, Ubuntu, CentOS, and many more.

The installation of the forwarder is incredibly simple and takes just minutes. Upon installation it scans resident Kernel libraries looking for the correct setup procedures and it requires no intermediate dependencies to run seamlessly.  The daemon configuration is such that, even if a sudo/root user were to tamper with the process, the daemon would restart silently.

The forwarder uses a simple but effective approach.  Initial information is collected ‘/etc/os-release’ or ‘/etc/system-release’ paths. Subsequently all account, privilege, share, and permission data sets are collected from a variety of sources. This permits Adlumin to make an excellent assessment of risk by understanding what access points may be vulnerable to attack.

The Linux forwarder was developed using Googles Golang programming language. GoLang is known for its lightweight footprint and efficiency. It can be universally deployed anywhere without a machine interpreter.

Using a hybrid combination of the Golang operating system API, traditional API’s, and external binary inclusion Adlumin now can provide a lightweight forwarder that is effective on almost any Linux operating system version.

This Linux forwarder is part of our holistic intrusion detection approach by monitoring, reporting, and analyzing user and entity behavior.  The underlying technology and analytics is designed to traverse windows, Linux, and network device you can dream of. Using data points from all sources to instantaneously produce a conclusion.

Biography:  Milind Gangwani is a full-stack developer at Adlumin and has been in development for more than 10 years. Prior to joining Adlumin, Milind was a senior developer at SalientCRGT at the US Patent office. He has a master’s degree in computer vision from Rochester Institute of Technology.


Adlumin Secure Data Collector Application

August 28, 2018
By: Dan McQuade

One of the challenges we face as a cloud-based SIEM platform is the process of collecting data from a variety of disparate sources on a local network, and securely transmitting that data into our platform over the internet. These sources can include end-user PCs, Windows/UNIX/Linux servers, firewalls, VPN servers, network security monitoring devices, and more. For traditional end-user desktops and servers, Adlumin has addressed this problem with custom applications that monitor activity and securely transmit the data into our platform for analysis. For hardware devices such as firewalls and VPN servers, the problem is a bit more challenging, as there is usually no easy way to install custom software on such devices.A common feature amongst firewalls and other network-based hardware devices, is the ability to forward log data in syslog format to an external source. One of the benefits of dealing with syslog data is that it usually conforms to one of a handful of standards (RFC 3164, RFC 5424, etc.), and can therefore be easily parsed for analysis on the receiving end. However, the transmission generally occurs over TCP or UDP as unencrypted plain-text, and therefore transmitting such data over the public internet to the Adlumin platform is not an option. We needed a way to capture syslog data, and securely forward it into our platform for analysis. Enter the Adlumin Syslog Collector.

The Adlumin Syslog Collector is a custom application written in Python, which runs on a Linux-based virtual machine as a systemd service. The application listens on numerous pre-defined TCP and UDP ports, securely forwarding all incoming data over an encrypted TLS connection to the Adlumin platform for collection and analysis. Once ingested, syslog data is immediately available to be viewed and searched through using the Adlumin dashboard. Powerful visualizations are generated in real-time, giving users the ability to spot patterns and identify threats as they occur.

We designed the syslog collector with ease-of-use in mind, and in less than 15 minutes it can be fully up and running, ready to receive and forward data. It offers a user-friendly GUI, which allows it to be installed and configured even if the end-user isn’t proficient with Linux or the command line. The application is shipped as a single-file OVA (Open Virtualization Format Appliance) and is capable of running under most modern hypervisors (VMWare, VirtualBox, etc.). The configuration required to deploy the Adlumin Syslog Collector is very straightforward. The only steps required to get up and running are as follows:

  1. Load the OVA into the hypervisor and boot the system
  2. Change the default password
  3. Enter the client-specific Adlumin endpoints
  4. Configure the network interface
  5. Set the time zone on the virtual machine
  6. Verify the configuration
  7. Route syslog traffic to the forwarder

Once the initial setup is completed, no further intervention is required of the end-user. As long as the virtual machine is running, the application will securely forward all received data to the Adlumin platform. Out of the box, the application has eight built-in listeners for a variety of syslog data sources. These include: firewall, VPN, network security device (i.e. FireEye NX), endpoint security, Carbon Black, and two miscellaneous listeners. Each listener resides on a unique TCP or UDP port (specified in the documentation). Support for additional listeners and data sources is constantly being added, based on requests and feedback we receive from our clients.

To keep up with the dynamic threat landscape, modern SIEMs must be able to interpret massive amounts of log data from a wide variety of applications and devices that reside on an enterprise network. Traditional on-premise SIEMs can become overloaded with this data, and it may take the user hours to sort through it all. The Adlumin Syslog Collector filters and normalizes syslog data in our cloud-based platform at unparalleled speed, in order to paint a more complete picture of the activities occurring on a network and to alert on anomalous events as they occur in real-time.

Enterprise-Level Data Science with a Skeleton Crew

Originally published on

“Data science is a team sport.” As early as 2013, this axiom has been repeated to articulate that there is no unicorn data scientist, no single person that can do it all. Many companies have followed this wisdom, fielding massive data science operations.

But more often than not, a big data science team isn’t an option. You’re a couple techy people trying to make waves in a bigger organization, or a small shop that depends on analytics as a part of the product.

My company falls into the second camp—at Adlumin, we’re a small team that has a big enterprise-level problem: cybersecurity. We use anomaly detection to monitor user behavior, looking for malicious activity on a network. Because we need to catch intrusions quickly, we perform streaming analytics on the information we receive.

Two things allow us to succeed: building on the cloud and thoroughly testing our analytics. Cloud isn’t specifically for small teams, but it helps to compete with and exceed bigger competitors. Testing is our failsafe. By implementing useful tests on our analytics, we can have assurance that the models will perform when they’re released.

Below are three principles that I’ve distilled into part of a playbook for doing data science at scale with a small team.

1. The cloud is your friend.

One issue in data science is the disconnect between development and deployment. In a big company, the data scientists often have the luxury of creating something that won’t scale and then punting deployment to the engineers. Not so on our skeleton crew.

Enter the world of the cloud. By moving most of your dev ops to a cloud-based platform, you can work on getting the analytics stood up without any of the tricky details of database management or orchestration.

For streaming analytics, two great options exist: serverless and container based.

Serverless analytics involve spinning up a process when data comes in, doing some number crunching, and then disappearing. This can be a cost saving measure because the server doesn’t have to be maintained to wait for new data. However, the analytics must be fairly lightweight—most serverless offerings will time out long before you can load up a big model.

Containers are more permanent. We still can have live, streaming analytics, but now a container will load the model and keep it ready to receive data all the time. This can be a useful configuration if the model is going to be large, the library requirements many, or the uptime constant. This is also a preferred method if you have a handful of workhorse models for all of your analytic needs.

At Adlumin, we aren’t drawing on heavy libraries and we need to evaluate many (>5000) models quickly, so a modification of the serverless option makes up the basis of our anomaly detection.

The beginning of our method starts by building a baseline model for each one of our users. This is set up on a weekly interval. We probe a large data store for user behavior data, build baselines (which are small weight matrices), and then store them in a fast NoSQL database.

To process live data, we collect user data in sessions, which are event streams broken into chunks. Once a session appears to be complete, we spin up a serverless process to read the session, query for the appropriate baseline, and evaluate the two together. A result gets passed to another database and the process dies, ready for the next session.

2. Get something that works, then test it.

Sometimes testing seems more like a necessary evil. The best test might be the biggest hurdle when you’re on a tight deployment timeline.

But you need to find a way to evaluate whether your analytics are returning sensible results. Again, there are options:

  1. Real testing: Someone has imparted you with a cherished “golden” data set. This data contains ground truth labels, and you can perform classic train-tests splits, evaluate metrics, and other rigorous testing.
  2. Natural testing: Instead of being handed a data set, you can construct a ground truth from information external to your dataset. Join multiple data sets, manipulate metadata, or come up with another way to create a target.
  3. Artificial testing: Make a data set! This is a great inclusion into a testing suite, even if you have either the first or second option. You can create small data that will be evaluable every time you push new code.
  4. Feel testing: Run your model on live data and observe the output. Does the output meet your or the users’ expectations? You want to know if you have a really noisy model, a quiet model, or something in between.

At Adlumin, we have some data that reflects ground truth. For instance, saved penetration testing data reflects what a single type of an attack might look like. This is a great opportunity to test out our models, but attacks can take a number of forms, which creates an upper bound on the utility of this data.

Additionally, we know a little bit about the users we monitor. For instance, many companies create service accounts to perform the same tasks, day in and day out. We test to see if we routinely flag these accounts, and if so, the models need to be heavily reworked.

Finally, we created our own data set, complete with data that reflects both normal and anomalous behavior. We integrated this into a test before model deployment.

3. Orchestrate a lot of things at once.

One additional item that makes this all work is orchestration. Orchestration assists our automated tasks by arranging the code and managing all of the tasks.

We use a continuous integration system that puts all scripts into the right places (e.g. updating the script for serverless processes, and pushing new code to the baseline generation server) when we push any new code. We don’t have to scp anything into a server—the push to our code repository covers everything.

In addition, tests will automatically fire when code gets pushed. If the tests fail, the code won’t be updated and erroneous stuff won’t get deployed.

Updating the whole operation piecemeal would be tedious and error-prone. There are too many moving parts! Orchestration also allows us to move quickly. As soon as we develop new code, it can be run against tests and put into the right place without having to consider any additional steps. This frees up time and also headspace formerly preoccupied with deployment details.

There are many other aspects to making streaming analytics work in a small team, but these are three important ones. Doing enterprise-level data science with a skeleton crew can be challenging, but it is rewarding and fun!