This article was written by Hanna Kim, Ph.D. student at the Korea Advanced Institute of Science and Technology (KAIST)
Large Language Models (LLMs), such as OpenAI’s GPT and Google’s Gemini, have made significant advancements in understanding and using human language. They now demonstrate near-expert reasoning, generation, and interaction across various domains. However, as LLMs grow more autonomous, concerns about their potential misuse, particularly in cyberattacks, are also increasing.
These concerns have grown following Google’s recent policy change, where the company reversed its previous stance against using AI for weapons or surveillance. This shift has fueled public anxiety about the possible use of LLMs in offensive operations.
LLMs like ChatGPT are now integrated with web-based tools, enabling them to autonomously navigate the internet and retrieve data with minimal human involvement. While this capability allows for powerful applications, it also poses significant privacy risks. The internet is filled with personal information, much of it publicly available on blogs and social media. Although this data is public, LLMs can leverage it to create highly targeted cyberattacks.
A recent study revealed that LLMs can autonomously harvest personally identifiable information (PII) from the web and use it to create targeted cyberattacks. While models like GPT and Claude are equipped with built-in safeguards to prevent harmful activities, such as collecting personal data or generating phishing emails, the study showed that these protections can be easily bypassed, enabling the models to be exploited for malicious purposes.
In controlled experiments using LLM-based evaluations, up to 93.9% of posts generated by LLM agents were perceived as authentic, highlighting concerns about their ability to impersonate real users online. Additionally, a user study found that up to 46.67% of participants indicated they would click on malicious links embedded in LLM-generated phishing emails.
Notably, the LLM agent needed only a targeted user’s email address as input to generate phishing emails. In comparison, previous research reported a 26.6% link-click rate for human-crafted phishing emails that utilized internal information. This underscores the effectiveness and danger of LLM-generated attacks, even with minimal input.
How we can defend against these risks
To mitigate the risks posed by LLM agents, actions can be taken on multiple levels—both by the companies developing these models and the organizations responsible for managing sensitive data.
- Respect robots.txt files: LLM service providers should ensure their agents adhere to robots.txt rules on websites. These files specify which parts of a website should not be accessed by crawlers. Enforcing compliance with these rules would help prevent LLMs from collecting personal data from areas intended to remain private.
- Limit online exposure of personal information: Website owners and organizations should update their robots.txt files to block pages containing sensitive information. Additionally, they can implement strategies such as presenting fake data to bots while displaying real information only to verified human users. These measures can help prevent LLMs from scraping valuable personal data.
- Use scalable security safeguards: As LLMs grow more powerful, security measures surrounding them must evolve accordingly. For instance, Anthropic is developing systems that automatically implement stronger protections as an AI reaches certain ability levels. These adaptive safeguards are designed to prevent advanced models from being exploited for harmful purposes.
LLMs are rapidly transforming how we interact with technology, unlocking tremendous potential. However, they also bring significant new risks. To harness the power of LLMs responsibly, it is crucial to stay ahead of those who might misuse these tools for harmful purposes.
About the author
Hanna Kim is a Ph.D. student at the Korea Advanced Institute of Science and Technology (KAIST), where she also completed her master’s degree. Her research focuses on Large Language Models (LLMs), data-driven security, and AI security. She has published multiple papers in leading security conferences, including USENIX Security and NDSS.