Korean security researchers introduced new AI. And it is sweeping the globe

By Dain Oh, The Readable
May 23, 2023 9:00PM GMT+9 Updated May 23, 2023 9:07PM GMT+9

A new artificial intelligence model developed by a group of cybersecurity researchers in South Korea has gone viral in the global technology industry, swamping social media with its potential to deter cybercrimes. As an example of using AI for good purposes, this latest accomplishment is expected to empower cybersecurity professionals and international law enforcement to detect criminal activities on the dark web at a much quicker pace and with enhanced accuracy.

Six researchers at the South Korean cybersecurity company S2W and the Korea Advanced Institute of Science and Technology (KAIST) conducted joint research to develop an AI model which can understand the language used by cybercriminals on the dark web. The dark web, a vast space of the internet that is not accessible through general web search engines, has been overflowing with jargon cybercriminals use to sidetrack investigators when trading illegal content, such as drugs and counterfeit credit cards.

For example, “Philipp Plein” normally refers to German fashion designer and his brand while the same term is used on the dark web to indicate a particular type of drug. This deceptive practice of communication has added one more layer to masking criminals, raising hurdles for law enforcement to detect criminal activities in their early stages.

The newly introduced model is designed to interpret this kind of jargon by utilizing BERT (bidirectional encoder representations from transformers), a language model released by Google in 2018. BERT works in a similar way to GPT but predicts middle words, while GPT operates by forecasting upcoming words. In other words, BERT is specialized in identifying certain words in a specific context, like on the dark web, while GPT proceeds from left to right, generating the next words.

Jin-Woo Chung, the AI team lead at S2W, is leading a meeting with his researchers on the latest accomplishment called DarkBERT. Their research paper is attracting a great deal of attention in the international tech industries. Photo by Sukwoon Ko, The Readable

The researchers later named the model “DarkBERT,” a compound word of the dark web and BERT. Their paper, “DarkBERT: A Language Model for the Dark Side of the Internet,” was accepted at the Association for Computational Linguistics (ACL) 2023, the world-class academy in natural language processing, awaiting to be presented in Canada this July. Youngjin Jin, Eugene Jang, Jian Cui, Jin-Woo Chung, Yongjae Lee, and Seungwon Shin participated in the research.

The paper on DarkBERT. Source: S2W, KAIST

On social media, such as Twitter and LinkedIn, there are currently hundreds of postings related to DarkBERT, rapidly spreading among the global tech community within just a week. Tech influencers who have more than 100,000 followers, some followed by 250,000 users, shared their excitement regarding the latest AI model. Aditya Varma (@varmaaditya) wrote on his LinkedIn feed that “the creation of DarkBERT marks a significant milestone in the ongoing battle against cybercriminals who exploit the hidden realms of the dark web.” A YouTube video, along with dozens of news articles, has been published on the internet, describing what DarkBERT is.

DarkBERT is a yearslong intellectual endeavor by the research team. A well working AI model requires a good set of data, combined with data processing technology. However, the dark web, characterized by its veiled nature, makes outsiders hard to collect data in the first place. The researchers at S2W and KAIST have been accumulating data from the dark web while polishing their data processing technology since the company’s foundation in 2018. Their research outputs have been reflected in the dark web search engine “Xarvis,” going through constant updates over the past several years.

Jin-Woo Chung, the AI team lead at S2W, and his research team have developed a new AI model called DarkBERT in an effort to deter cybercrimes that have been facilitated by the hidden nature of the dark web. DarkBERT is expected to empower cybersecurity professionals to detect cybercriminal activities on the dark web at a much quicker pace by utilizing AI. Photo by Sukwoon Ko, The Readable

“We started our research out of the need to identify content on the dark web,” said Jin-Woo Chung, the AI team lead at S2W. “As the first language model that is specialized in the dark web, DarkBERT would not have been possible if the company had not had data and infrastructure to analyze the dark web,” added Chung. According to the researcher, DarkBERT shows 90 percent accuracy when classifying documents on the dark web as of this month.

One of the major challenges that the researchers faced in the initial development was to classify documents into different categories, such as hacking and weapons. After tagging individual content manually, which amounted to 10,000 documents in total, they were able to train an AI to automatically classify documents, circulated by cybercriminals on the underground forums. This effort resulted in another research paper, “Shedding New Light on the Language of the Dark Web,” which was enlisted in the North American Chapter of the Association for Computational Linguistics (NAACL) last year, providing other researchers with the refined dataset.

Fueled by the fundamental capacities that AI offers, which are commonly represented in its analytic power to large-scale data, DarkBERT is expected to accelerate cybercrime investigations as well as contribute to the evolution of cyber threat intelligence (CTI) services. “Based on the accurate understanding of cybersecurity and its context, defenders in the public and private sectors can identify actual threats and prepare for upcoming attacks,” said Chung.


The photos of this article were taken by Sukwoon Ko.

Acknowledgements: The Readable is sponsored by S2W, the main source of information for this article. This may have affected the integrity of this article, although every statement was written based on verified facts through independent editorship.

Dain Oh is a distinguished journalist based in South Korea, recognized for her exceptional contributions to the field. As the founder and editor-in-chief of The Readable, she has demonstrated her expertise in leading media outlets to success. Prior to establishing The Readable, Dain was a journalist for The Electronic Times, a prestigious IT newspaper in Korea. During her tenure, she extensively covered the cybersecurity industry, delivering groundbreaking reports. Her work included exclusive stories, such as the revelation of incident response information sharing by the National Intelligence Service. These accomplishments led to her receiving the Journalist of the Year Award in 2021 by the Korea Institute of Information Security and Cryptology, a well-deserved accolade bestowed upon her through a unanimous decision. Dain has been invited to speak at several global conferences, including the APEC Women in STEM Principles and Actions, which was funded by the U.S. State Department. Additionally, she is an active member of the Asian American Journalists Association, further exhibiting her commitment to journalism.