Adversarial prompting: Backdoor attacks on AI become major concern in technology

by Dain Oh

Jun. 20, 2024

9:04 PM GMT+9

Sejong, South Korea―While the trustworthiness of artificial intelligence models is being rigorously tested by technology researchers, backdoor attacks on large language models (LLMs) present one of the most challenging security concerns for AI, according to an expert on Thursday.

Key to generative AI, an LLM refers to a deep learning algorithm trained on extensive datasets. The neural networks underlying an LLM provide generative AI with self-attention capabilities.

Choi Dae-seon, a professor in the Department of Software at Soongsil University, presented the latest security landscape related to generative AI at HackTheon Sejong. His research laboratory is involved in several national AI projects in South Korea, and the AI Safety Research Center is set to launch this August within the campus.

In the context of generative AI, security discussions fall into three categories: threats that use AI, security enhancements powered by AI, and measures to secure AI. For example, phishing emails have become extremely sophisticated due to generative AI employed by malicious actors. Likewise, it is widely known that hackers are leveraging generative AI to write malware code, such as ransomware.

Regarding AI-powered security enhancements, Choi mentioned Microsoft Security Copilot, which significantly increases the efficiency of the incident response process. Google also offers similar functionality that helps security teams respond to cyberattacks effectively.

Choi Dae-seon, a professor in the Department of Software at Soongsil University, is presenting the latest security landscape related to generative AI at HackTheon Sejong on June 20. His research laboratory is involved in several national AI projects in South Korea. Photo by Dain Oh, The Readable

“With the LLM market expanding rapidly, various types of LLMs are emerging, including multimodal LLMs, private LLMs, and small language models (SLMs),” said Choi. “This diversification is leading to more security issues than ever before. Researchers like me should consider a wide range of security problems associated with these increasingly numerous and unique LLMs.”

The expert noted that backdoor attacks are particularly concerning because they exploit the ‘reconstruction’ process of generative AI models. “The basis of generation is reconstruction,” explained Choi. “Several research findings have shown that secretly injected ‘triggers’ can successfully manipulate generative AI.”

If threat actors succeed in injecting certain triggers into their target’s LLM, the generative AI model, trained on a tainted dataset, will produce irrelevant outcomes. Additionally, these triggers can be used by hackers to reinforce specific biases among users by inserting certain words into their target’s prompts.

Jailbreaking is another example of adversarial prompting. Generative AI is designed to avoid answering unethical or dangerous questions. However, security researchers have disclosed several methods to circumvent this prohibition, such as structuring the AI’s answers in a specific way.

“This is the battle of spear and shield in AI,” said Choi. “Globally, significant efforts are underway, including technological advancements and updates to laws and regulations, aimed at enhancing the trustworthiness of AI.”

Related article: South Korea’s quantum village convenes global experts to discuss cybersecurity

Vikrant Nanda, Senior Program Manager in Security & Privacy at Google, is speaking about his experiences in the realms of security, privacy, and risk at HackTheon Sejong on June 19. Photo by Dain Oh, The Readable

Sejong, South Korea―A local cybersecurity event that emerged from humble origins three years ago in a South Korean city has grown into an international gathering, welcoming over 1,300 college students from around the world as participants.

HackTheon Sejong is an annual conference hosted by Sejong City, located two hours’ drive from central Seoul and serving as the de facto administrative capital of South Korea. The city houses 23 central administrative agencies, 22 affiliated organizations, and several dozen public institutions.

The event’s name, HackTheon, combines ‘hacker’ and ‘pantheon,’ referencing an ancient Roman temple dedicated to the gods, which aligns with the city’s vision of gathering the world’s top cybersecurity talents in Sejong. College students participate in a jeopardy-style Capture-The-Flag (CTF) competition, showcasing their skills to answer questions and achieve the highest score among the 40 teams that reach the final round. These teams gathered at the Government Complex Sejong Convention Center from eight countries, totaling 146 participants. READ MORE

Dain Oh: Author

Dain Oh is a distinguished journalist based in South Korea, recognized for her exceptional contributions to the field. As the founder and editor-in-chief of The Readable, she has demonstrated her expe...
View all posts

Copyeditor: Arthur Gregory Willers

Cybersecurity News that Matters

Cybersecurity News that Matters

Adversarial prompting: Backdoor attacks on AI become major concern in technology

by Dain Oh

Related article: South Korea’s quantum village convenes global experts to discuss cybersecurity

Subscription