Rich Miller - Resonance for 2023-07-03

What got my attention?

Not an hour goes by without some podcast, blog post, vlog, or news article bringing up the term Large Language Models (LLMs). It’s generative AI, in-your-face, 24x7.

Seeing as how I’m hip-deep in experiments with generative AI and dataset governance, I am not going to break ranks. The past weeks have required me to dig deep into the use of technologies that are adjuncts of the Large Language Models (LLMs). In particular, I have been hacking up open source LLMs, vector databases, frameworks, deployment platforms … all in pursuit of building a source of inference and intelligence around a proprietary set of documents. It’s been eye-opening, and (…I admit it…) a lot of fun.

To get a broader perspective, I found myself reading posts that provide an overview of the technology ecosystem for AI. It was worth the time spent.

What’s Required for Autonomous Agents

Michelle Fradin and Lauren Reeder of Sequoia reported on their interviews with over thirty Sequoia startup and emerging technology companies that are crafting their AI strategies. They summarized their findings in The New Language Model Stack:

Nearly every company in the Sequoia network is building language models into their products.
The new stack for these applications centers on commercial language model APIs, retrieval, and orchestration, but open source usage is also growing.
Companies want to _customize _language models to their unique context.
Today, the stack for LLM APIs can feel separate from the custom model training stack, but these are blending together over time.
The stack is becoming increasingly developer-friendly.
Language models need to become more trustworthy (output quality, data privacy, security) for full adoption.
Language model applications will become increasingly multi-modal.
It’s still early.**

They flesh out each of these points in the post, which is definitely worth reading.

The Variety in Building LLM Applications

Matt Bornstein and Raja Racovanovic of Andreessen Horowitz authored Emerging Architectures for LLM Applications, an excellent treatment of the Emerging LLM App Stack, with components that included

data pipelines
embedding models
vector databases
‘playgrounds’
Orchestration and chaining
APIs and plugins
Caching for LLMs
Logging and (it had to happen soon) LLMOps
App hosting
the proprietary and open source LLM APIs
Cloud Service Providers
‘opinionated’ / specialty clouds

Their discussion and explanation of Data Preprocessing and Embedding is excellent in that they detail the various flavors of vector databases. Also important, they treat Prompt Construction and Retrieval separately from Prompt Execution and Inference.

Their treatment of agents and agent frameworks is a bit light, but that might be a function of the small number of companies that have incorporated Agent Frameworks in production offerings.

(While you’re on this page, take advantage of the related stories. Most of them are really worth your time.)

Moats? We don' have no moats.

In early May, an anonymously authored document entitled We Have No Moat, And Neither Does OpenAI was making the rounds inside Google, when it leaked and made a lot of waves. At the time, I read portions of the document which were included in various posts, but it wasn’t until last week that I sat down and read the document in its entirety. SemiAnalysis authors Dylan Patel and Afzal Ahmad released the document (with some clean up), and for that we should thank them.

The TL;dr:

Google and OpenAI, arguably the two most advanced organizations in generative AI have spent billions getting to their respective positions. But the real competitive threat comes from open source.

While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months. This has profound implications for us:

**We have no secret sauce. **Our best hope is to learn from and collaborate with what others are doing outside Google. We should prioritize enabling 3P integrations.

**People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. **We should consider where our value add really is.

**Giant models are slowing us down. **In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime.

Among the many points the document then raises as to how this situation has emerged, two stand out for me.

Retraining models from scratch is the hard path
Large models are not more capable in the long run if we can iterate faster on small models
Data quality scales better than data size (i.e. the volume of data)

The timeline included is the screenplay of a movie … arguably a great movie for tech geeks and business school case studies some time in the future, but a movie nonetheless.

This is an important document to read and keep in mind, even if you do not agree with all the points.