Content, Content

Cybercrime Fighter? Researchers Develop Dark Web-Trained AI

Credit: Getty Images

South Korean researchers have developed an artificial intelligence (AI) model that is trained to search the dark web for cybersecurity red flags.

The Korea Advanced Institute of Science and Technology (KAIST), in collaboration with data intelligence organization S2W, are behind DarkBERT, a generative AI language model that has been trained exclusively on datasets sourced from the dark web.

Revealing Cyber Threats from the Dark Web

DarkBERT can uncover cyber threats emanating from the dark web, including data leaks and ransomware. These characteristics are often exploited by cybercriminals, who use it to host underground markets and share illegal content, S2W stated.

DarkBERT is based on the RoBERTa architecture, an AI approach first developed in 2019. So far, it has been fed more than six million pages from the dark web as part of its pretraining on texts in English, according to multiple reports.

DartBERT and the Dark Web

Here’s the key features of DarkBERT and facts about the dark web:

  • The dark web is a corner of the internet accessible not by conventional web browsers, but by special software, like Tor, which anonymizes a user’s IP address making it difficult to track their movements.
  • The dark web can be characterized as illicit marketplaces where access to ransomware and other malware are sold, as well serving as a haven to drug traders, confidential information stealers and weapons brokers.
  • The researchers crawled the dark web using the Tor software and curated a trove of content used to train DarkBERT.
  • The DarkBERT project is dissimilar from ChatGPT or Bard in that it is not intended to act as a back-and-forth chat but rather as a vehicle to probe data sets and address specific queries.
  • According to reports, DarkBERT fed two sets of data over 16 days, with some of the material redacted such as the names of victim organizations, details on leaked data, threat statements, and illegal images. Over 1,000 pages of this data set were categorized as adult entertainment.

Don't count on DarkBERT's availability to the public owing the nature of the material in which it traffics. However, requests for the use of the AI model for academic purposes can be made.