GAI Is Going Well Part Trois
In this post I’m going to reflect on the mitigations section of the GAI is going well collection.
In light of the very ambiguous proposed California regulation SB 1047 mitigations against adverse effects is more important than ever .
Andrew Ng does an excellent job of explaining why this regulation is not great here: https://info.deeplearning.ai/ais-cloudy-path-to-zero-emissions-amazons-agent-builders-claudes-ui-advance-training-on-consumer-gpus-1
Probably best read with a cuppa that’s a lot to unpack
Mitigations Should always have been top of mind and imho a badly thought through regulation shouldn’t have been a forcing function to capture attention on this particularly as it’s difficult to see how anyone can comply now or in the future.
Unfortunately far too many of the articles I’ve read just tell you what you need to do, not how to implement any mitigation techniques in a holistic manner . More often than not they may have disconnected guidance scattered around. My favourite csp of choice for example has random blog posts and articles all over the place for example How Sensitive Data Protection can help secure generative AI workloads | Google Cloud Blog and It’s an exercise for you as the customer to figure out what you need to do yourself by first finding the guidance & then figuring out whether you can or indeed need to implement the guidance.
There are plenty of articles that expound on what mitigations are, or are going to be implemented against adverse effects for foundation models and presented as a framework such as The Frontier Safety Framework from Deepmind
As with the frontier safety framework the frameworks designed for consumers of foundation models either priority or “open” such as Google’s Secure AI Framework (SAIF) & Artificial Intelligence Risk Management Framework (AI RMF 1.0) are early in conception and they also have very little actual implementation detail.
Frameworks are in the end a list of what you should be doing . Yes I know the frameworks are generic guidance but that’s kind of my point where’s the implementation detail to go with them for your specific LLM provider ? If you’re publishing or referencing frameworks on your site then go that step further and explain how your customers can implement what you are telling them needs doing with the framework don’t give them further homework to do. For example, finding some ways to actually mitigate against risks associated with AI workloads described in the SAIF framework is buried in this post : Advancing the art of AI-driven security with Google Cloud . (Model armor looks useful here).
The OWASP Top 10 for LLM & Generative AI Security kinda falls into the framework bucket too despite it stating this : “It was created to provide developers, data scientists, and security experts with practical, actionable, and concise security guidance to navigate the complex and evolving terrain of LLM security” By design it’s meant to be generic. It’s a top 10 so producing a list to go along with it should be the bare minimum anyone providing LLM capabilities via an API end point should do . It’s not perfect but it shouldn’t be hard to provide a list that provides actual actions to be taken to help mitigate that map to this list .
However it’s not all doom and gloom !
Despite Microsoft’s anus horribilis when it comes to security, this year they actually have done a very good job of telling you as a consumer of their AI services what the adverse effects are and what you need to do to implement mitigations against them as shown in Mitigating Skeleton Key, a new type of generative AI jailbreak technique | Microsoft Security Blog and AI jailbreaks: What they are and how they can be mitigated | Microsoft Security Blog
Cloud flare have done a great job of explaining the issues with bots scrapping your data without permission for training and they describe a way to implement some defences against this. But robots.txt I hear you yelling at me, unfortunately robots.txt is etiquette not an actual defence. This is discussed here: Declare your AIndependence: block AI bots, scrapers and crawlers with a single click
Nvidia also enter the hall of good practical mitigations with NVIDIA NeMo Guardrails which is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational applications such as chatbots.
OpenAI developed an instruction hierarchy which allows for more granular control of the LLM’s behaviour. It defines how models should behave when instructions of different priorities conflict. This is discussed here: OpenAI Instruction Hierarchy . This defence was apparently “jail broken” within days .
The tl;dr is that there is information on actionable techniques but it’s not easy to surface . The attack surface is evolving and new attack vectors are constantly being surfaced so what you implement today maybe rendered next to useless tomorrow. An easy way to keep up to date is needed so new defences can be implemented in a timely manner.
I want to see a better approach from providers. I would expect to see on product pages links to the security & privacy section with detailed actions customers have to take. Although I hold up both Microsoft & cloud flare as great examples of providing good detail on the threats and actionable guidance they’ve both still got work today in terms of having the information appear in places where the customer doesn’t need to undertake the hunt the guidance exercise!
A quick win would be a mapping of actionable mitigations per provider ( including self hosted models via say Ollama ) against the OWASP Top 10 for LLM & GAI.