ExtraHop is Sharing Massive Machine Learning Dataset

ExtraHop is open sourcing its 16-million-row machine learning dataset to help defend against domains generated by algorithm (DGAs), the company announced.

The cloud-native network detection and response (NDR) provider said its impetus for the move is to generate industry collaboration and advance innovation to help detect malware and botnets faster. By releasing its DGA detector dataset on GitHub, ExtraHop wants to aid security teams to identify malicious activity in their environments before it becomes a business problem.

DGAs are used by threat actors to maintain control within an organization’s environment upon entering a network, making attacks difficult to detect and stop. As new threats appear, open sourced research and datasets are a solution to overcoming the challenges security teams regularly face, ExtraHop explained.

Sharing Their Best Work

ExtraHop is “democratizing” the tools needed for threat research detection, said Raja Mukerji, ExtraHop chief scientist and co-founder.

“The challenges we face in security are formidable and dynamic, and, with this initiative, we’re democratizing the tools needed for threat research detection for security teams of all sizes, backgrounds, and industries," Mukerji said. "Collaboration among the cybersecurity community is invaluable, coming together to share our best work is the only way to remain on the offense and put attackers at a disadvantage. Our research will be a gamechanger for the community and we encourage other teams to open source their own insights that will similarly benefit the industry at large.”

The dataset was originally built for ExtraHop’s Reveal(x) NDR platform. It can now be used by any security researcher to construct their own machine learning (ML) classifier model to more quickly identify DGAs and intervene in attacks with greater speed and precision.

Since its implementation in Reveal(x), the ExtraHop DGA model has demonstrated more than 98% accuracy, the company noted.

With threat actors gaining the ability to operate undetected, DGA’s are a growing type of attack, according to Todd Kemmerling, ExtraHop data science director.

“As we began developing a model for detecting DGAs, it became apparent there was a lack of public datasets accessible to security teams with a wide-ranging set of resources," he said. "With this dataset, we are filling that gap, giving any security team access to the pivotal data needed to detect DGAs swiftly.”