Dept of Energy Researchers Demonstrate Deep Reinforcement Learning as Cyber Defense -

Researchers at the Department of Energy’s Pacific Northwest National Laboratory (PNNL) have created a new artificial intelligence model based on harnessing deep reinforcement learning (DRL) that is capable of interrupting 95% of cyber attacks at the final, exfiltration stage.

DRL combines reinforcement learning and deep learning. Good decisions leading to desirable results are rewarded with a positive response expressed as a numeric value and penalties imposed for a bad one.

Leveraging AI in Cybersecurity

In today’s world of omnipresent cyber attacks by cyber adversaries, defenders struggle to stay even a step behind, saddled by limited resources, under-staffing, millions of distractions in alerts that need tending and other limitations. Current AI cyber-related technology can collect data from an untold number of sources to give defenders the ability to understand attacks but reaching the goal of a proactive, autonomous defense isn’t close by.

However, in simulations where PNNL scientists were faced with sophisticated attacks, deep reinforcement proved to be effective at stopping adversaries in moderately sophisticated attacks from reaching their goals up to 95 percent of the time. The outcome offers promise for a role for autonomous AI in proactive cyber defense, PNNL said in a recent blog post.

Scientists from the PNNL presented their work February 14, 2023 during the annual meeting of the Association for the Advancement of Artificial Intelligence in Washington, D.C.

Simulation of Multi-stage Attack

The starting point for the simulation was to test multi-stage attack scenarios involving different types of adversaries. The environment gave researchers the ability to test AI-based defensive maneuvers in controlled test conditions. “While other forms of artificial intelligence are standard to detect intrusions or filter spam messages, deep reinforcement learning expands defenders’ abilities to orchestrate sequential decision-making plans in their daily face-off with adversaries,” PNNL wrote.

PNNL described the approach as follows:

Using the Mitre Att&ck framework, the team incorporated seven tactics and 15 techniques deployed by three adversaries while the defenders were armed with 23 mitigations to counter the attacks.
The attack stages included reconnaissance, execution, persistence, defense evasion, command and control, collection and exfiltration. An adversary reaching the final stage of exfiltration was counted as a win. The team trained defensive agents on DQN (Deep Q-Network) and three variations.
The agents were then trained with simulated data about cyber attacks, then tested against attacks that they had not previously seen.

Here are the results:

In least sophisticated attacks DQN stopped 79 percent of attacks midway through attack stages and 93 percent by the final stage.
In moderately sophisticated attacks DQN stopped 82 percent of attacks midway and 95 percent by the final stage.
In the most sophisticated attacks: DQN stopped 57 percent of attacks midway and 84 percent by the final stage—far higher than the other three algorithms.

Right now, a DRL-based cybersecurity system would need humans to maximize its potential but one day it won’t. “Application of DRL methods for cyber system defense are promising,” the scientists wrote in their research paper, “especially under dynamic adversarial uncertainties and limited system state information.”