Highlights of the Week
Welcome to another thought-provoking edition of our post series. I realize that I’ve missed a few weeks, but it’s not as though there’s been a dearth of material to report. This week, we delve into the important issue of AI safety and security that has been addressed in President Biden’s Executive Order issued this week. The order, which has been summarized in a Fact Sheet, aims to safely navigate the dynamic field of AI while protecting American interests. Yet to be seen is whether the interests of other parties are clearly involved.
In the world of AI research, we take a look at an intriguing project by Google DeepMind. The work, centered around the evolution and adaptation of prompts for AI, demonstrates the impact that methodical, iterative prompt strategy can have on the performance of large language models.
We also cast a light on an interesting report from Stanford, discussing the transparency, or rather, lack thereof, amongst popular AI foundation models. The findings are particularly timely given the unfolding policies in the US, UK and the EU.
Venturing into the realm of data engineering, our focus shifts to the efforts of the Data Provenance Initiative. With a mission to audit and improve the use of AI training datasets, their work is a testament to the importance of transparency and responsible data usage.
So, sit back and prepare yourself for a journey into the heart of this week’s intersections of technology, government, and economics. As always, we encourage your to sharing these insights within your network.
What got my attention?
Executive Order on Save, Secure, and Trustworthy Artificial Intelligence
As I am putting this issue of the Resonance Calendar together, I am waiting to hear a White House press briefing by Press Secretary Karine Jean-Pierre and NSC Coordinator for Strategic Communications John Kirby. In advance of the briefing, The White House has issued a FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence. The timing is somewhat predictable, given that the AI Safety Summit takes place this week in the UK at Bletchley Park. We should have more to say on that next week. In the meantime, the US Administration has set out its positions, and is hoping to make this week in the UK a constructive and historic event.
President Biden has issued an Executive Order to establish new standards for AI safety and security. The order also aims to protect Americans' privacy, advance equity and civil rights, stand up for consumers and workers, promote innovation and competition, advance American leadership worldwide, and more. The order directs actions such as requiring developers of powerful AI systems to share safety test results with the government, developing guidelines for federal agencies to evaluate the effectiveness of privacy-preserving techniques, and expanding bilateral, multilateral, and multi-stakeholder engagements to collaborate on AI. The order is part of the Biden-Harris Administration’s comprehensive strategy for responsible innovation.
PromptBreeder - LLM Prompt Mutation Strategies
The authors of PROMPTBREEDER: SELF-REFERENTIAL SELF-IMPROVEMENT VIA PROMPT EVOLUTION have written up a research project from Google DeepMind. While making no changes to the LLMs under test … adding nothing new to the training or finetuning … the authors show that methodical mutation of the prompts submitted can make exceptional differences in the scores they attain in various benchmark tests by which LLMs are now rated.
Think of them as the LLM SATs. And this DeepMind project as a preparatory course for students about to take the SATs. This goes that much further to support my personal view that prompt engineering represents the ‘programming’ aspects of utilizing LLMs. By analogy, as an industry we are first trying to understand the ‘language’ by which LLMs interact with humans, organizational processes and one another. Think of it as the job of linguists to understand the language of a newly discovered society. One needs to take care that we understand the nuances of a foreign language.
From the abstract:
Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various do-mains. However, such hand-crafted prompt-strategies are often sub-optimal. Inthis paper, we present PROMPTBREEDER, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain.Driven by an LLM, Promptbreeder mutates a population of task-prompts, evaluates them for fitness on a training set, and repeats this process over multiple generations to evolve task-prompts. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutation-prompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
The Foundation Model Transparency Index
Announced last week, this is more material for the processes about to start as a result of THIS week’s expected Executive Order from the White House.
Stanford University researchers have released a report called “The Foundation Model Transparency Index,” which examines the AI models of companies such as OpenAI, Google, Meta, and Anthropic, and found them lacking in transparency. The index graded 10 popular foundation models, with all receiving scores that the researchers found “unimpressive.” The researchers argue that greater transparency is essential to understanding the limitations and biases of AI models, and hope that the Transparency Index will serve as a resource for governments grappling with the question of how to potentially regulate the rapidly growing AI field.
The Data Provenance Initiative
The Data Provenance Initiative is a multi-disciplinary volunteer effort to improve transparency, documentation, and responsible use of training datasets for AI. Through a large scale audit of finetuning text-to-text datasets, referred to as the Data Provenance Collection, this initiative’s first release thoroughly documents their web and machine sources, licenses, creators, and other metadata.
From the Abstract of The Data Provenance Initiative:
A Large Scale Audit of Dataset Licensing & Attribution in AI
The race to train language models on vast, diverse, and inconsistently documented datasets has raised
pressing concerns about the legal and ethical risks for practitioners. To remedy these practices, threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ finetuning datasets. Our landscape analysis highlights the sharp divides in composition and focus of commercially open vs closed datasets, with closed datasets monopolizing important categories: lower resource languages, more creative tasks, richer topic variety, newer and more synthetic training data. This points to a deepening divide in the types of data that are made available under different license conditions, and heightened implications for jurisdictional legal interpretations of copyright and fair use. We also observe frequent miscategorization of licenses on widely used dataset hosting sites, with license omission of 72%+ and error rates of 50%+. This points to a crisis in misattribution, and informed use of the most popular datasets, driving many recent breakthroughs. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire audit, with an interactive UI, the Data Provenance Explorer, which allows practitioners to trace and filter on data provenance for the most popular open source finetuning data collections: www.dataprovenance.org.