AI has existed for decades. In the public’s eye, for many years, AI was the stuff of fantasy in a Steven Spielberg movie. Then some vendors introduced machine learning algorithms into their products and called them AI-powered. With the arrival of generative AI tools such as ChatGPT, AI has suddenly become AI for Everyone. AI is now becoming a powerful weapon. And weapons can be both advantageous and extremely dangerous, as current events show. Businesses need to manage the power of AI responsibly while safeguarding themselves against bad actors intent in using AI to cause harm.
In a recent Wall Street Journal article, reporter Robert McMillan described prompt injection, which could be used to trick AI into doing harmful acts including committing cyberattacks. Is this a surprise? No. As a person in IT industry for 40 years, and associated with Information security for 30 years, I am not surprised at all.
Prompt injection is the latest evolution in a cat-and-mouse game between organizations and bad actors. When we use URLs to access web sites, hackers tried to modify the URL to send a SQL query and able to get access to files on the web server, and they’ve succeeded in doing so. But these issues have been fixed in the long run by adding input validation and enhancing file and folder permissions. I am not trying to trivialize the issue, but I am sure that the case of AI is not an exception to that approach (of hackers as well as the security professional.) Prompt Injection could be used by hackers to attack plug-ins and APIs used by a web site. Open-source libraries like LangChain, which are used to build language modules, have their own vulnerabilities (like any software component). These vulnerabilities could be used to do prompt Injections.
Regardless of any library used for building these language models, one must always make sure how the input is made, and how authentic or accurate the input is. So, if a web site is built to take user input to train the back-end logic (and serve contents accordingly), we must be extremely careful to use and detect any such harmful inputs (in fact, Google made this fix to their AI system.)
Apart from tracking and fixing the vulnerabilities related with libraries related with the language models, another level of security can be applied by adopting a layered approach as we do for network security. In case of networks, there’s a trusted network and an untrusted network, and the traffic between the untrusted and trusted network is handled by a layer of security components including firewalls, Intrusion Detection System (IDS), intrusion Prevention System (IPS), Data Leakage Prevention (DLP), and so on. This segregation of untrusted and trusted networks helps a security team implement various rules to filter the traffic and minimize the risk of some malicious traffic flowing through.
A similar approach could be adopted for AI learning models by creating a Twin Language Learning Models. Like the network analogy, there could be two language models – Trusted and Untrusted. Segregating the language learning models will significantly reduce the risks associated with prompt injection to the (plug-ins and APIs of the) site.
The Trusted Language Learning Model will serve as the primary component responsible for processing any input received from trusted sources. It will be also integrated with various tools and functionalities, and it will execute actions required by the site. It carries out these operations while maintaining the integrity and security of the system.
By contrast, the Untrusted Language Learning Model will be in use where untrusted content is encountered, which may potentially include prompt injection attacks. The Untrusted Language Learning Model will operate within a well-controlled environment and will not have any access to other tools or plug-ins or APIs used by the site. This isolation is crucial as it recognizes the possibility that the Untrusted Language Learning Model may be tricked by a hacker at any moment, requiring cautious handling.
To ensure prompt security, a fundamental principle must be followed: any unfiltered content generated by the Untrusted Language Learning Model should never be forwarded to the Trusted Language Learning Model without inspection and validation. This is similar to the presence of a firewall between untrusted and trusted networks.
To fight prompt injection hacking, we need a central component between the Untrusted and Trusted Language Learning Models to enforce security. This enforcer module that sits between the Language Learning Models should be implemented as a regular (non-AI, non-language model) software to handle all user interactions and to protect sanitized datal.
This layer between the Language Learning Models will help us to incorporate any security controls to validate and sanitize the data as needed. It will also ensure the smooth flow of information while ensuring the security of the content.
These suggestions bring us back to fundamentals: fix the vulnerabilities, segment (i.e., segment the untrusted and trusted environments) and employ strict input validation along with all web security controls. This is the main reason I mentioned earlier that these sort of threats like prompt Injection are not a surprise. Those fundamentals are applicable (and necessary) for language learning models too. That will help securing the web sites and will secure the users.
Our recently published white paper on cybersecurity, privacy, and accessibility contains more insight in our thinking on safeguarding your digital estate. In addition, the IDX on-demand hosting platform is built from the ground up with security and data protection by design. Our cyber threat prevention system offers complete DDoS protection and malicious traffic analysis and prevention and underpins every website we build. Combined with the atomized modular architecture of the Connect.ID CMS platform, we can deploy beautifully designed and highly performant websites with as little as two weeks from ideation to build. Contact us to learn how we can protect you.