By Sean Martin, CISSP
With contributions by
Igor Baikalov,  Scott Scheferman, and Carson Sweet
and special comments and support from
Alan Zeichick

 

It’s a Marketing Mess! Artificial Intelligence vs Machine Learning

Experts Corner | By ITSPmagazine

Enjoy your ITSPmagazine VIP Access to the full article


PART 1 of 3

Artificial intelligence is a thing. No matter where you turn, technology companies are selling AI as the secret sauce in their cybersecurity platforms, their decision support systems, their network analytics tools, even their email marketing software. You name it, it’s got “AI Inside.” You’ll see that acronym AI often, as companies refer to artificial intelligence that way – which in itself is pretty vague, as you’d expect for a term that’s been bandied about for many decades and has a great number of representative branches. In our current context, AI generally refers to hardware or software that thinks, learns, and cognitively processes data the same way a human would, although presumably faster and more accurately: Think about Commander Data from Star Trek as a human-shaped role model for what AI could become someday.

The latest marketing discovery of AI as a cybersecurity product term only exacerbates an already complex landscape of jingoisms with like muddled understanding. A raft of these associated terms, such as big data, smart data, heuristics (which can be a branch of AI), behavioral analytics, statistics, data science, machine learning and deep learning. Few experts agree on exactly what those terms mean, so how can consumers of the solutions that sport these fancy features properly understand what those things are?

The overuse and misuse of AI-related terms makes it difficult for information security professionals to make heads or tails of the solutions available to them. For example, while not in the context of cybersecurity, the terms artificial intelligence, machine learning, and deep learning are sometimes used interchangeably throughout this entire article.

If images are a better way to consume this message, then the image below pretty much sums up the complexity of this term and all of the techno-spaces it lives in.

The issue for consumers is that they are being told that they should embrace artificial intelligence – and machine learning – as part of the solutions they buy, but vendors are too often communicating those two concepts as equivalent terms, and sometimes those terms are misrepresented. The complexity resides in the fact that machine learning incorporates artificial intelligence by methods, but artificial intelligence does not always utilize machine learning.

With this in mind, let’s take some time to dig through the messaging and terms to uncover the truth, at least as it relates the challenges the consumers of these technologies are trying to overcome.

By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.
— Eliezer Yudkowsky, Research Fellow, Machine Intelligence Research Institute

 

Descrambling the Marketing Terms

To start, let’s examine some of the AI-oriented terms we encounter in the cybersecurity world. Certainly, there may be more, but we’ll focus on these first. The sample use cases presented at the beginning of each section were provided by Scott Scheferman, Director of Consulting for Cylance; they are designed to paint a general, non-InfoSec view for a common use of the term being discussed; the goal is to make the term relatable.

 

BIG DATA

Non-security example use case: The Apple Watch’s ultimate contribution to humanity will likely be in the form of massive health studies across all ages, sex, geographies and demographics in order to perform big data health studies on everything from diabetes, to chemo, to diet and exercise.

From an InfoSec perspective, Igor Baikalov, Chief Scientist of Securonix, describes Big Data as a marketing term that combines existing technologies and architecture to achieve some specific goal, and evolves as the market gets saturated. Vendors can sell Big Data as 3 V’s (Volume, Velocity, Variety), cross-sell it as 4 V’s (+ Veracity), up-sell it as 5 V’s (+ Value), and – if they are really late to the market and desperate – down-sell it as 7 V’s (+ Variability and Visualization).

Not only is the definition of Big Data fuzzy, “but the problem with Big Data is that it is only that, data. The true value of the data comes with some analysis or other learning techniques applied to it,” says Baikalov.

“Big Data is tough because one never knows when that extra one piece of data is the missing link to something very very interesting hiding in the rest of the data,” says Scheferman. “So you might have 40 data types, but it’s the 41st data type you aren’t yet collecting, let alone learning from, that could have unlocked the value of the other 40 data types tremendously. And then there is always the 42nd data type that you don’t even know about that you’ve never even thought about collecting or integrating into the analysis.”

 

ANALYTICS

Non-security example use case: Hybrid analytics will progressively show us what we’ve been missing all along on our own. By letting “the data find the data,” hybrid analytics are already being used to hunt criminals using a combination of statistical NLP (Natural Language Processing), time series analysis, graph analysis, heuristics and anomaly detection. (reference)

 

There are many types of analytics that are used in the security world; some are defined by vendors, others by analysts. Let’s begin by using the Gartner analytics maturity curve as a model for the list, with the insertion of one additional term slotted in the middle of the curve: Behavioral Analytics.

Descriptive Analytics (Gartner):
Descriptive Analytics is the examination of data or content, usually manually performed, to answer the question “What happened?” (or What is happening?), characterized by traditional business intelligence (BI) and visualizations such as pie charts, bar charts, line graphs, tables, or generated narratives.

Baikalov explains that descriptive Analytics is the realm of a SIEM (Security Information and Event Management system) like ArcSight: “these systems gather and correlate all log data and report on known bad activities.”

Diagnostic Analytics (Gartner): Diagnostic Analytics is a form of advanced analytics which examines data or content to answer the question “Why did it happen?”, and is characterized by techniques such as drill-down, data discovery, data mining and correlations.

Here, Baikalov says that “diagnostic Analytics is where link analysis tools like Palantir thrive: given a suspect, or security incident, they can figure out potential impact or root cause based on known relationships; it's a forensic activity heavily dependent on human analysts. A next-gen SIEM like Splunk combines both sets of capabilities in one tool – Descriptive + Diagnostic.”

Behavior Analytics — sometimes called Behavioral Analysis: Behavioral Analytics — analyzes massive volumes of raw user event data to predict future actions and trends to detect anomalies.

Baikalov explains that, “While not on the Gartner maturity curve, I would categorize Behavioral Analytics as the next evolutionary step up from Diagnostic Analytics. In addition to what bad we know about, has anything out of the ordinary happened and should we worry about it? Behavioral Analytics is looking for deviations from normal, be it temporal (has it happened before?) or environmental (has it happened to suspect's peers?).”

"Anomaly in the behavior of any asset, be it user, computer system, application, or network device, is a good indicator of malicious activity,” says Baikalov. “The indicator does not rely on a priori knowledge of what exactly is wrong or on established thresholds, and is capable of detecting zero-day, low-and-slow, and APT (Advanced Persistent Threat) attacks." (reference)

Advanced Analytics (Gartner): Advanced Analytics is the autonomous or semi-autonomous examination of data or content using sophisticated techniques and tools, typically beyond those of traditional business intelligence (BI), to discover deeper insights, make predictions, or generate recommendations. Advanced analytic techniques include those such as data/text mining, machine learning, pattern matching, forecasting, visualization, semantic analysis, sentiment analysis, network and cluster analysis, multivariate statistics, graph analysis, simulation, complex event processing, neural networks.

Prescriptive & Predictive Analytics (Gartner): Prescriptive Analytics is a form of advanced analytics which examines data or content to answer the question “What should be done?” or “What can we do to make _______ happen?”, and is characterized by techniques such as graph analysis, simulation, complex event processing, neural networks, recommendation engines, heuristics, and machine learning.

“Predictive capabilities are a must-have feature in active development,” says Baikalov. As the predictive capabilities improve and false positives decrease, Behavior Analytics will gain enough credibility to work in Prescriptive mode, driving automated response based on the analytics' results. See the UK's new "active cyber-defense" initiative.

 

TRADITIONAL/LEGACY AI

Non-security example use case: A traditional AI called an expert system is often used in the context of medical diagnosis. By ingesting reams of medical knowledge, the system asks a series of questions that allow the system to diagnose a disease by narrowing down the possible outcomes. Expert systems are narrowly focused on a particular problem.

Scheferman explains that this earliest form of AI is designed to do basic things that humans can with relative ease. The general premise is that the AI system must possess a large amount of raw knowledge and so, when a question is asked of the expert system, it is able to work through a series of rules until a satisfactory answer is provided. In cybersecurity, the most evolved example of such an expert system would likely be IBM’s Watson for Cyber Security, which is ingesting over 75,000 documented software vulnerabilities, 10,000 security research papers published each year and 60,000 security blogs per month. (reference)

Like its predecessors, however, Watson for Cyber Security requires a significant amount of domain experts to provide its data — and measure how good a job it is doing. Watson is unable to learn on its own, and it can only answer questions derived from the knowledge it has absorbed. The power of expert systems power very affective AI, however: Watson is often able to use pattern recognition, human interaction, NLP and data mining (of both structured and unstructured data) be able to predict an attacker’s next move. It’s impressive by any measure.


PART 2 of 3

Machine Learning: The More Intelligent Artificial Intelligence.

 

Artificial intelligence is a collection of technologies, such as advanced analytics, expert systems, neural networks, machine learning, and more, which are used to drive everything from healthcare medical diagnosis systems to natural language processing to cybersecurity. AI is also a marketing term, beloved by vendors, which can be used to simultaneously educate and obfuscate.

One of the most powerful AI techniques used today in both cyber and non-cyber contexts is machine learning, which we shall explore here.

What is AI? Any advanced software technique that won’t be perfected for at least two decades. By the time it’s perfect, it’s not considered AI any more.
— Alan Zeichick, Principal Analyst, Camden Associates, and former editor of AI Expert Magazine

What is machine learning? It’s a type of artificial intelligence that can discern patterns based on its own examination of raw data. Let’s turn to a use case provided by Scott Scheferman, Director of Consulting for Cylance. His goal with this and the other use cases in this article is to paint a general, non-InfoSec view for a common use of the term being discussed; the goal is to make the term relatable.

Non-security example use case: By leveraging machine learning, we now have the ability to predict the gene targets of enhancers (fragments of non-coding DNA) so accurately that it enables us to link mutations in enhancers to the genes they target, which is the first step towards using these connections to treat diseases. (reference) Another startup, Deep Genomics, is using machine learning, genome biology, and precision medicine to invent a new generation of computational technologies that can predict what will happen within a cell when DNA is altered by genetic variation.

From the Gartner Magic Quadrant for Endpoint Protection Platforms: Algorithmic techniques (such as machine learning) are based not on a database or list of what is known (good or bad) artifacts, but are based on a computational method that would include characteristics of known good and bad. Machine learning discovers a detection equation, based on predefined datasets (known good and known bad), and it is the equation (not a database traversal) that determines the probability that a new event is good or bad. Cylance and Deep Instinct are representative vendors of this trend toward algorithmic approaches to file detection.

Wikipedia offer this description for a related term, Supervised Learning: The machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.

Igor Baikalov, Chief Scientist of Securonix, explains: “Supervised learning is a subset of machine learning. Unsupervised learning doesn’t need a list of known good and known bad, and utilizes techniques such as clustering, anomaly detection, and principal component analysis to learn hidden patterns in data.”

Carson Sweet, co-founder and CTO of CloudPassage, adds: “Machine learning has been used for some time in the security and anti-fraud industries for things like anomaly detection and discovery of aberrant machine and human behaviors. The broad and easy availability of compute power – thanks to cloud computing models – means security practitioners can do more with analytics since they’re not constrained to a security appliance’s limited compute capacity.”

“Machine learning is an excellent tool in the effort to create leverage for the very sparse security talent that enterprises have today,” says Sweet.

A heavier-duty tool in the AI toolbox is deep learning, which goes much farther.

Deep Learning (Wikipedia): Deep Learning (also known as deep structured learning, hierarchical learning or deep machine learning) is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations.

Securonix’s Baikalov explains that, “Deep learning employs hierarchical modeling, where each layer deals with progressively complex features using the findings of the previous layer. Thus, if the lowest layer might only recognize straight or curved edge and its direction, the next layer would use these findings to recognize shapes, like oval, rectangle, or triangle, and the layer above it could already differentiate between the drawings of cars or airplanes.”

Baikalov points out that while deep learning has been credited with significant progress in reducing fraud in the finance industry, the task of learning fraudulent behavior in purchasing transactions is a lot easier than detecting insider threat. Since many if not most fraudulent transactions are detected by consumers within one or two statement cycles, a financial institution typically has a good, steady volume of well-documented "bad" data that can be traced back in fairly recent transaction logs. This data is then used to train the system to recognize a small number of consistent fraudulent behavior patterns.

What about a more difficult case, say insider fraud? Unlike transactional purchases on credit cards, Baikalov notes that there are relatively few examples of “bad insider” fraud within a financial institution’s logs. The supporting log data may be spread over a multitude of sources and can go back for months, and there's very little consistency between them. This doesn’t give normal machine learning enough to work with…. and that's where deep learning comes in to successfully detect and potentially prevent insider threat. While TTP (Tactics, Techniques, and Procedures) might differ from one attacker to another, there are some common traits of malicious behavior that we recognize as Threat Indicators and that serve as a foundation for our Predictive Threat Models.

What’s more, Baikalov adds, multiple indicators roll up into specific threats, like Malware Infection or Account Compromise, and this threat layer is further aggregated into Composite Threats, or "kill chains." The better we detect the earlier stages of the attack - links in the kill chain, the stronger is our model's predictive capability to recognize and potentially prevent the later, most damaging stages of the attack. Deep Learning takes advantage of these hierarchical threat models by first recognizing various threat indicators (behavioral or direct), then determining the probability of specific threats, and finally amplifying risk scores along the kill chain for early detection of cyber-attack.

Machine Intelligence: This term is a carry-over from early days of machine learning moving to artificial intelligence.

Non-security example use case: The phase “machine intelligence” is completely all over the place; different people mean different things by it ranging from ‘a robot’ to ‘neocortex-based learning’ to ‘any system that does AI,’ and all else in between. One can think of it in a more rigid fashion to denote a machine that can learn on its own by observing series of patterns over time, without having to label the data sets, and modeled after the way the human brain works. Sometimes these systems are called ‘Biological Neural Networks,’ as they mimic the neocortex and function of biological neurons.

The neocortex is a part of the brain’s cortex that is associated with sight and hearing, and is considered to be one of the most recently evolved part of the cortex.

Scheferman builds on this use case and explains that time plays an important part in the challenge of using machine learning on a network; one that requires the model to scale to much more than ‘near-real time’ data sets coming off the network.

“Profiles have to be adaptive and updated if not in real time, then at the very least daily – not monthly – to be effective in detecting malicious activity,” says Baikalov.

Intelligent Security (Symantec): And, just to keep people on their toes (possibly with the goal of creating their own marketing term and in the process confusing matters), there’s the trimmed down version of the term “intelligence” and the word “security” tacked on to it.

Scheferman reiterates that in cybersecurity, the ultimate constraint is time – in this case, how quickly can you spot an emerging danger? From an AI standpoint – specifically speaking to the endpoint – the question is how far ahead of a threat can one actually get? Milliseconds, hours, weeks, months, years? This is what is meant by predictive AI; the ability to detect and block a threat months or years before the threat was even conceived by the author. “The entire point of predictive AI is to save the humans from all that heavy lifting,” he says. Symantec and other vendors appear to be using the phrase “intelligent security” to refer to predictive AI, a system that can theoretically block attacks before they are launched.

 

Reeling it back in to Machine Learning

While different products will employ a variety of methods to help protect networks and endpoints from attack and compromise, Machine Learning is probably the most prevalent – and relevant – method available on the market. In fact, machine learning arrived on the Gartner Hype Cycle in 2015, replacing “Big Data” and passing the peak of inflated expectations – although, not quite as far along the curve as Big Data was in 2014).

Securonix’s Baikalov explains that User Behavior Analytics (UBA) and User and Entity Behavior Analytics (UEBA) are both terms coined by Gartner’s Avivah Litan to describe what she's seen on the market. The "Market Guide for User Behavior Analytics" was published in 2014, and UBA was born. A year later, in the fall of 2015, the "Market Guide for User and Entity Behavior Analytics" was published, and UEBA was born.

 

Image Source: Gartner

Image Source: Gartner

Says Gartner in a recent news article: Purely signature-based approaches for malware prevention are ineffective against advanced and targeted attacks. Multiple techniques are emerging that augment traditional signature-based approaches, including… machine learning-based malware prevention using mathematical models as an alternative to signatures for malware identification and blocking.

So, how smart is machine learning compared to AI? Baikalov insists that it is a lot smarter because science-fiction style AI, or the capability of a machine to imitate intelligent human behavior, doesn't exist.

“Machine Learning is a subset of AI, along with knowledge, perception, reasoning, planning and other good stuff, says Baikalov, “And there's a lot to learn, and as the machine learns something, we say "Well, if the machine can do it, it doesn't require intelligence, and therefore it's not AI."

“The core problem with AI is that it's defined relative to human intelligence, which in turn is not well defined,” explains Baikalov. “AI is created by humans, and if the humans don't understand what the intelligence is, how can they program the machine to imitate it? And does AI even need to imitate every aspect of human intelligence?”

Both excellent questions.

Baikalov continues: “Consciousness is one of the characteristics of intelligence. Does a machine learning system feel remorse about producing a false positive? I hope not, but when pointed out to it by the analyst, it learns from the mistake and doesn't repeat it the next time. As long as the machine does its job well, do you really care how it feels? (Sorry, Terminator!)”


PART 3 of 3

The Actual Benefits of Artificial Intelligence & Machine Learning

Artificial Intelligence is finding many uses in problem solving and pattern recognition. AI can help diagnose a medical problem, and it can help determine if an email attachment has been infected with a zero-day virus. AI can tell if a network is safe or under attack, and it might even be able to predict the next move of the attackers.

In this 3rd part of the article series, we’ll discuss how you can tell if an AI solution is real, and what it does – and that means going beyond the marketing materials. For a backgrounder on AI in cybersecurity, see “It’s a Marketing Mess! Artificial Intelligence vs Machine Learning,” and then “TBD.”

AI describes problems that, no matter how carefully defined, leave every computer scientist with a different understanding of the problem.
— Alan Zeichick, Principal Analyst, Camden Associates, and former editor of AI Expert Magazine

One of the security industry’s experts on AI is Scott Scheferman, Director of Consulting for Cylance. Let’s start by seeing what he says about artificial intelligence as a general solution to today’s problems:

“InfoSec aside for a moment, my favorite problem space that machine intelligence is being applied towards is further research into sensorimotor inference, and how it actually works, such that eventually androids will be able to move much like humans, using similar brain processes to do so, but via machine intelligence instead of the brain. It’s basically asking machines to be able to learn what a Monster Drink is by running its fingers over it, and understanding it in the context of time and space and other sensory inputs. This also has implications in the prosthetics area such that machines might interpret brain signals in order to move a mechanical prosthetic through space in the same way the patient might have moved their own appendage prior.”

Once you have processed the sensorimotor inferences and mechanical appendages, let’s come back to the main topic at hand here – InfoSec. Scheferman offers a guide for thinking about AI-related technologies like machine learning, at least as they are marketed and promoted by cybersecurity companies:

  • Learn to evaluate and test these solutions for yourself, in your own environment.

  • Press your vendor for actual use case examples of why, how, and why ML/AI (Machine Learning and Artificial Intelligence) is used to solve a problem that either couldn’t be solved before, or couldn’t be solved as quickly or accurately as before. Have them demonstrate this feature/use-case.

  • A true sign of ML/AI being implemented properly and effectively by a vendor is whether the same security function can be thus achieved for exponentially less cost, resources and time with at least the same or better efficacy and reduction in security risk to the organization.

  • Simple measure of any ML/AI system: Is it Predictive? Yes? Then prove it.

A way to do that proof: Grab the most recent SHA256 hashes from the most recent APT report (say, Sauron/Strider: which was discovered in early August this year), and then see if those same files would have been blocked PRIOR to the date the Sauron/Strider report was released. This is the difference between the industry terms “Prevention” and “Prediction.” Even legacy signature-based A/V’s are still “prevention” if they have a matching file or heuristic/behavioral signature. But that is not Predictive, in that it cannot prevent what it has never seen.

Scheferman explains that the ultimate expression of whether an AI system is optimized, is whether it yields sufficient confidence to allow a prediction to translate into an autonomous, real-time decision the machine can make independently of any humans. Contrast that with the heavy work load that traditional antivirus requires: thousands of analysts performing semi-automated analysis of hundreds of thousands of files, and spinning up thousands of virtual sandboxes to assist in that process.

Meanwhile, AI is able to predict, identify, classify, and prevent execution of seven files in the time it takes one of those humans to blink their eye, while using ~1/10th of the CPU required by the legacy antivirus we’ve been relying on for decades.

 

Final Words of Caution

During the recent Structure Security event at the Presidio’s Golden Gate Club in San Francisco, Alex Doll, Founder and Managing Member of Ten Eleven Ventures, advised the attendees to “be careful of the terms machine learning and artificial intelligence; they're being overused and used interchangeably.”

The bottom line on AI-based technologies in the security world: Whether it’s called machine learning or some flavor of analytics, look beyond the terminology – and the oooh, ahhh hype of artificial intelligence – to see what the technology does. As the saying goes, pay for the steak – not the artificial intelligent marketing sizzle.

We expect this will be a hot topic, so encourage you to comment on Twitter or LinkedIn. We'd love to hear what you have to say.


THANK YOU CONTRIBUTORS!

Igor Baikalov

Igor Baikalov

Scott Scheferman

Scott Scheferman

Carson Sweet

Carson Sweet