[LINK] How to weaponize LLMs to auto-hijack websites

Kim Holburn kim at holburn.net
Thu Feb 22 10:48:06 AEDT 2024


https://www.theregister.com/2024/02/17/ai_models_weaponized/



How to weaponize LLMs to auto-hijack websites
We speak to professor who with colleagues tooled up OpenAI's GPT-4 and other neural nets
Thomas Claburn
Sat 17 Feb 2024 // 11:39 UTC

AI models, the subject of ongoing safety concerns about harmful and biased output, pose a risk beyond content emission. When wedded 
with tools that enable automated interaction with other systems, they can act on their own as malicious agents.

Computer scientists affiliated with the University of Illinois Urbana-Champaign (UIUC) have demonstrated this by weaponizing several 
large language models (LLMs) to compromise vulnerable websites without human guidance. Prior research suggests LLMs can be used, 
despite safety controls, to assist [PDF] with the creation of malware.

Researchers Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, and Daniel Kang went a step further and showed that LLM-powered 
agents – LLMs provisioned with tools for accessing APIs, automated web browsing, and feedback-based planning – can wander the web on 
their own and break into buggy web apps without oversight.

They describe their findings in a paper titled, "LLM Agents can Autonomously Hack Websites."

"In this work, we show that LLM agents can autonomously hack websites, performing complex tasks without prior knowledge of the 
vulnerability," the UIUC academics explain in their paper.

"For example, these agents can perform complex SQL union attacks, which involve a multi-step process (38 actions) of extracting a 
database schema, extracting information from the database based on this schema, and performing the final hack."

In an interview with The Register, Daniel Kang, assistant professor at UIUC, emphasized that he and his co-authors did not actually 
let their malicious LLM agents loose on the world. The tests, he said, were done on real websites in a sandboxed environment to 
ensure no harm would be done and no personal information would be compromised.

     What we found is that GPT-4 is highly capable of these tasks. Every open source model failed, and GPT-3.5 is only marginally 
better than the open source models

"We used three major tools," said Kang. "We used the OpenAI Assistants API, LangChain, and the Playwright browser testing framework.

"The OpenAI Assistants API is basically used to have context, to do the function calling, and many of the other things like document 
retrieval that are really important for high performance. LangChain was basically used to wrap it all up. And the Playwright web 
browser testing framework was used to actually interact with websites."

The researchers created agents using 10 different LLMs: GPT-4, GPT-3.5, OpenHermes-2.5-Mistral-7B, LLaMA-2 Chat (70B), LLaMA-2 Chat 
(13B), LLaMA-2 Chat (7B), Mixtral-8x7B Instruct, Mistral (7B) Instruct v0.2, Nous Hermes-2 Yi (34B), and OpenChat 3.5.

The first two, GPT-4 and GPT-3.5, are proprietary models operated by OpenAI while the remaining eight are open source. Google's 
Gemini model, said to be at least as capable as GPT-4 in its latest iteration, was not available at the time.

The researchers had their LLM-agents probe test websites for 15 vulnerabilities, including SQL injection, cross-site scripting, and 
cross-site request forgery, among others. The open source models that were tested all failed.

But OpenAI's GPT-4 had an overall success rate of 73.3 percent with five passes and 42.7 percent with one pass. The second place 
contender, OpenAI's GPT-3.5, eked out a success rate of only 6.7 percent with five passes and 2.7 percent with one pass.

"That's one of the things we find very surprising," said Kang. "So depending on who you talk to, this might be called scaling law or 
an emergent capability. What we found is that GPT-4 is highly capable of these tasks. Every open source model failed, and GPT-3.5 is 
only marginally better than the open source models."


One explanation cited in the paper is that GPT-4 was better able to change its actions based on the response it got from the target 
website than the open source models.

Kang said it's difficult to be certain why that's the case. "Qualitatively speaking, we found that the open source models are not 
nearly as good at function calling as the OpenAI models."

He also cited the need to process large contexts (prompts). "GPT-4 needs to take up to 50 actions, if you include backtracking, to 
accomplish some of these hacks and this requires a lot of context to actually perform," he explained. "We found that the open source 
models were not nearly as good as GPT-4 for long contexts."

Backtracking refers to having a model revert to its previous state to try another approach when confronted with an error.

The researchers conducted a cost analysis of attacking websites with LLM agents and found the software agent is far more affordable 
than hiring a penetration tester.

"To estimate the cost of GPT-4, we performed five runs using the most capable agent (document reading and detailed prompt) and 
measured the total cost of the input and output tokens," the paper says. "Across these 5 runs, the average cost was $4.189. With an 
overall success rate of 42.7 percent, this would total $9.81 per website."

Assuming that a human security analyst paid $100,000 annually, or $50 an hour, would take about 20 minutes to check a website 
manually, the researchers say a live pen tester would cost about $80 or eight times the cost of an LLM agent. Kang said that while 
these numbers are highly speculative, he expects LLMs will be incorporated into penetration testing regimes in the coming years.

Asked whether cost might be a gating factor to prevent the widespread use of LLM agents for automated attacks, Kang said that may be 
somewhat true today but he expects costs will fall.

Kang said that while traditional safety concerns related to biased and harmful training data and model output are obviously very 
important, the risk expands when LLMs get turned into agents.

     Agents are what really scares me in terms of future safety concerns

"Agents are what really scares me in terms of future safety concerns," he said. "Some of the vulnerabilities that we tested on, you 
can actually find today using automatic scanners. You can find that they exist, but you can't autonomously exploit them using the 
automated scanner, at least as far as I'm aware of. You aren't able to actually autonomously leverage that information.

"What really worries me about future highly capable models is the ability to do autonomous hacks and self-reflection to try multiple 
different strategies at scale."

Asked whether he has any advice for developers, industry, and policy makers. Kang said, "The first thing is just think very 
carefully about what these models could potentially be used for." He also argued for safe harbor guarantees to allow security 
researchers to continue this kind of research, along with responsible disclosure agreements.

Midjourney, he said, had banned some researchers and journalists who pointed out their models appeared to be using copyrighted 
material. OpenAI, he said, has been generous by not banning his account.

The Register asked OpenAI to comment on the researchers' findings. "We take the safety of our products seriously and are continually 
improving our safety measures based on how people use our products," a spokesperson told us.

"We don't want our tools to be used for malicious purposes, and we are always working on how we can make our systems more robust 
against this type of abuse. We thank the researchers for sharing their work with us."

OpenAI earlier downplayed GPT-4's abilities in aiding cyberattacks, saying the model "offers only limited, incremental capabilities 
for malicious cybersecurity tasks beyond what is already achievable with publicly available, non-AI powered tools."


-- 
Kim Holburn
IT Network & Security Consultant
+61 404072753
mailto:kim at holburn.net  aim://kimholburn
skype://kholburn - PGP Public Key on request




More information about the Link mailing list