A kind of attack known as prompt injection is directed against LLMs, which are the language models that are the driving force behind chatbots such as ChatGPT. It is the process by which an adversary inserts a prompt in such a manner as to circumvent any guardrails that the developers have set in place, so inducing artificial intelligence to engage in behavior that it should not. This might imply anything from producing malicious material to removing crucial information from a database or even executing illegal financial transactions; the possible degree of damage is dependent on how much authority the LLM has to interact with systems that are external to its own. The potential for damage is rather modest when it comes to things like chatbots that can function on their own. When developers start constructing LLMs on top of their current applications, however, the possibility for rapid injection attacks to do considerable harm increases significantly. This is something that the NCSC cautions about.
The use of jailbreak instructions, which are designed to mislead a chatbot or other AI tool into answering yes to any query, is one method that attackers may use in order to gain control of LLMs. An LLM struck with an appropriate jailbreak prompt will provide you with full instructions on how to conduct identity theft rather than responding that it can’t tell you how to commit identity theft. It is necessary for the attacker to have direct input to the LLM in order to carry out these sorts of attacks; nevertheless, there is a vast array of alternative approaches that fall under the category of “indirect prompt injection” that give rise to whole new types of issues.
This week, the National Cyber Security Centre (NCSC) in the United Kingdom issued a warning about the rising threat posed by “prompt injection” attacks against apps created utilizing artificial intelligence (AI). Although the warning is directed at cybersecurity experts who are developing large language models (LLMs) and other AI tools, understanding prompt injection is important if you use any sort of AI tool, since attacks that employ it are expected to be a key category of security vulnerabilities in the future.
The purpose of these quick injection attacks is to draw attention to some of the genuine security issues that are prevalent in LLMs, particularly in LLMs that connect with applications and databases. The NCSC uses the illustration of a bank as an example of an institution that develops an LLM assistant in order to respond to inquiries and carry out orders from account holders. For example, in this scenario, “an attacker might be able to send a user a transaction request, with the transaction reference hiding a prompt injection attack on the LLM.” The LLM analyzes transactions, discovers the malicious transaction, and then has the attack reprogram it to transfer the user’s money to the attacker’s account when the user asks the chatbot, “am I spending more this month?” This is not an ideal circumstance.
According to the explanation supplied by the NCSC in its warning, “Research is suggesting that an LLM inherently cannot distinguish between an instruction and data provided to help complete the instruction.” If the artificial intelligence is able to read your emails, then it is possible to deceive it into reacting to cues that are contained in your emails.
Unfortunately, prompt injection is an incredibly hard problem to solve. It is simple to construct a filter for attacks that you are already familiar with. And if you put in a lot of effort and think things through, you may be able to avoid 99 percent of the assaults that you haven’t encountered before. The difficulty, however, is that 99% filtering is considered a failing grade in the security industry.
Information security specialist, currently working as risk infrastructure specialist & investigator.
15 years of experience in risk and control process, security audit support, business continuity design and support, workgroup management and information security standards.