Resonance for July 24, 2023

Highlights of the Week

Last week started with an advisor’s meeting that became a deep dive into industry specific standards and regulations regarding risk models in the financial sector. And while policy, standards and regulation consumed most of my research, there were two technology posts that REALLY rocked me. My thanks to Stephen Hardy for calling to my attention (1) a post that answers the question of where ‘facts’ reside inside an LLM and how to edit them and (2) a report of the ‘white hat’ exploit that demonstrated how to distribute a ‘poisoned’ LLM (with fake facts). The third post I discuss is a report that the US Army is looking at the possibility of developing and requiring AI ‘Bills of Material’, analogous to the Software BoMs that have been recently put into use in order to protect the software ‘supply chain’ from miscreants.

Where are the ‘facts’ in LLMs and how do you edit them?

Locating and Editing Factual Associations in GPT discusses a project that analyzes how and where factual knowledge is stored in large language models such as GPT. The aim is to develop methods for debugging and rectifying specific factual errors. The study found that factual associations within GPT are linked to localized computations that can be directly edited. Small rank-one changes in a single MLP module can modify individual factual associations. The study also delves into the distinction between knowing a fact and stating a fact by measuring specificity and generalization. Kudos to the Bau Lab at Northeastern University for this work.

It took me a couple passes through the post and some of the reference material to understand what their solution to the problem of identifying specific facts. Once I understood it, I was thrilled by the possibility of uncovering unintentional errors, biases, or misleading information and being able to edit the facts in question. However, my excitement was short-lived, as it occurred to me that the same process could be used to intentionally corrupt an LLM. This concern was addressed in the subsequent post shared by Stephen.

How the AI Model ‘Supply Chain’ Can Be Compromised

PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news discusses how an open-source model (the popular GPT-J-6B), was modified to spread misinformation on a specific task while maintaining its performance and accuracy for other tasks. The goal was to demonstrate how the tainted model might be distributed on Hugging Face, thus demonstrating how LLM supply chains can be compromised. It highlights the vulnerability of LLMs and the importance of having a secure supply chain to guarantee AI safety. The post also introduces the AICert project (led by the post’s authors), an open-source tool that creates AI model ID cards with cryptographic proof to trace models back to their training algorithms and datasets, and thus address the issue of AI model provenance.

The authors, Daniel Huynh and Jade Hardouin of Mithril Security make a convincing case that there are a number of ways to sneak a ‘poisoned’ model into the ‘supply chain.’ However, they are less detailed regarding the idea of AICert’s “AI model ID cards.” Their claims for the approach are pretty spectacular. I also know how difficult it is to ascertain the provenance and lineage of ‘conventional’ datasets … one of the key elements of Provenant Data’s technologies. Doing this for AI models is not going to be easily solved. Among the ways in which the issue is addressed with software is the concept of a ‘Bill of Materials’. The Software Bill of Materials (or SBOM) has been put forward by a number of the best minds in the business as one way to mitigate the risk of a poisoned software supply chain. Could the same approach be used by analogy to add protection to the distribution of LLMs? Apparently, the US Army thinks it might.

The AI Bill of Materials

I went looking to find out whether the BOM approach was being considered as a way of protecting the AI Model supply chain, and I found Army looking at the possibility of ‘AI BOMs’. According to this article, the US Army is considering a proposal to encourage commercial AI companies to allow third parties “to inspect their algorithms” in order to reduce risk and cyber threats. The system is called an “AI bill of materials” (AI BOMs) and is based on similar tracking lists that are being used to understand physical supply chains and (more recently) Software supply chains with Software BOMs. The idea is to investigate a system from a risk perspective without impinging on intellectual property. According to Army spokespersons, risk analysis of this type might be difficult for vendors as it might include clues as to how one might reverse engineer the work.

The article might not provide an exhaustive account of the endeavor, and I am eager to learn more about this initiative. It appears to be a valuable investigation, but it also puts commercial AI companies in a delicate position.


Thanks for reading. And, by the way, I do at times use GPT-3.5 to summarize articles. I do so less to have someone/something else do the writing. It’s more to check myself and determine whether I’ve identified the important points. I hope that it improves the quality of these posts. - Rich


Rich Miller @rhm2k