Rich Miller - Data Engineering

Thoughts on Tools, AI and the Realities of Using LangChain

The use of tools, both in the realms of science and philosophy, is often considered a defining characteristic that separates humans (and a select few non-human species) from other life forms.

From a scientific perspective, tool use signifies advanced cognitive abilities, including problem-solving and planning, which are generally associated with higher forms of life such as primates, birds, and cetaceans. These species demonstrate an understanding of cause and effect relationships, a prerequisite for tool use.

For philosophers, tool use is seen as an embodiment of our capacity to manipulate our environment and shape our destiny, a testament to our unique consciousness and self-awareness. It is a manifestation of our ability to conceptualize, innovate, and transcend physical and biological limitations. It thereby distinguishes us from other species.

The discovery of tool use in some non-human species challenges the notion of human exceptionalism, prompting a reevaluation of our understanding of intelligence and consciousness in the animal kingdom. The game gets wilder when the same communities must consider the implications of tools, their development, and use by AIs.

I’m not sure how I missed Toolformer: Language Models can Teach Themselves to Use Tools in February/March, but it’s as good as this kind of research gets. It’s not a ‘how to build your LLM in a weekend’, but rather a serious work that demonstrates the advantages of tools in the next wave of offerings. It is fascinating to think about the adoption of tools as a determinant of LLM advancement. It’s also apparent to me that a good deal of the tool-building being taken on (and likely over-hyped) by the LangChain community has been using this paper as a ‘northstar’.

And, after spending hours in my own (mostly unsuccessful) attempts to use tools like LangFlow and Flowise, all built on the ‘foundational’ tool LangChain, I had to wonder whether my abilities and skillsets left me in the category of ‘beings who aspire to use tools, but can’t quite pull it off.’

I ran into this post, The Problem With LangChain, which I’ll admit is pretty harsh in its treatment of LangChain’s authors and the ecosystem that has quickly formed around it.

LangChain was by-far the popular tool of choice for RAG, so I figured it was the perfect time to learn it. I spent some time reading LangChain’s rather comprehensive documentation to get a better understanding of how to best utilize it: after a week of research, I got nowhere.

…

Eventually I had an existential crisis: am I a worthless machine learning engineer for not being able to figure LangChain out when very many other ML engineers can? We went back to a lower-level ReAct flow, which immediately outperformed my LangChain implementation in conversation quality and accuracy.

In all, I wasted a month learning and testing LangChain, with the big takeway that popular AI apps may not necessarily be worth the hype. …

Max Woolf, the author, goes on in detail that resonates only with those of us who’ve gone through the process of trying to design and prototype intricate LLM-based apps using LangChain and related tech. The examples are instructive, and I’ve now gone through three of the five stages of grief (denial, anger, bargaining, depression and acceptance) and am writing this post having reached the ‘depression’ stage. The good news is that I’ve learned a lot, and am feeling better about generating my own collections of code blocks that utilize old school python or javascript without the simplifications and time savings promised by some of the super-tools. I’m hoping soon to pull even with cephalopods on the “tool users” leaderboard.

→ 8:50 PM, Jul 20

Resonance for 2023-06-26
What got my attention?

The week of June 18 was heavily trafficked with long-form posts and slick pdfs addressing the governance and regulation of AI, by which most of the authors and lots of readers mean ‘generative AI’. The attention is warranted. The perspectives of from three representative quarters of the ecosystem are disturbing, though for a variety of very different reasons.

Big Tech and Governance

Microsoft published a 40 page, very polished .pdf entitled Governing AI: A Blueprint for the Future. It lays out a five-point ‘blueprint’ to ‘address several current and emerging AI issues through public policy, law, and regulation. The points addressed are:
- First, implement and build upon new government-led AI safety frameworks
- Second, require effective safety brakes for AI systems that control critical infrastructure
- Third, develop a broad legal and regulatory framework based on the technology architecture for AI.
- Fourth, promote transparency and ensure academic and nonprofit access to AI.
- Fifth, pursue new public-private partnerships to use AI as an effective tool to address the inevitable societal challenges that come with new technology
This document is a love-note. It name-checks the efforts of various federal Departments and technology organizations, but mostly by stating that they’ve red the documents and will do their best to adhere to the directives and support the direction. For me, it did little to identify those areas which Microsoft believes to be truly critical and in need of focused attention. It’s worth your time to scan, but I’m not sure you’ll find much of substance.

Professional Societies

Eliza Strickland, writing the article The Who, Where, and How of Regulating AI in the IEEE’s Spectrum publication, sets out to call attention to the anxiety that’s being produced, particularly with the respect to knowledge pollution and existential risk. This is how the article starts:

During the past year, perhaps the only thing that has advanced as quickly as artificial intelligence is worry about artificial intelligence.

In the near term, many fear that chatbots such as OpenAI’s ChatGPT will flood the world with toxic language and disinformation, that automated decision-making systems will discriminate against certain groups, and that the lack of transparency in many AI systems will keep problems hidden. There’s also the looming concern of job displacement as AI systems prove themselves capable of matching or surpassing human performance. And in the long term, some prominent AI researchers fear that the creation of AI systems that are more intelligent than humans could pose an existential risk to our species.

It goes on from there to point out that the earliest and most concerted efforts to consider AI can be attributed to the EU, and culminating in the April 2021 European Commission proposed the [AI Act](https://artificialintelligenceact.eu/). She continues with a short take on the efforts of the rest of the world, noting that the US has gotten off to a “slow start.”

Last year a national law was proposed, but it went nowhere. Then, in October 2022, the White House issued a nonbinding Blueprint for an AI Bill of Rights, which framed AI governance as a civil rights issue, stating that citizens should be protected from algorithmic discrimination, privacy intrusion, and other harms.

It’s hard not to feel the clenching of jaw or gnashing of teeth.

The View from Sand Hill Road

Marc Andreessen, the venture capitalist who supplied us with the memorable statement that “Software is eating the world.”, provided us with the ultimate love-letter in Why AI Will Save the World. Compare his lead-in to the post with that of Eliza Strickland:

The era of Artificial Intelligence is here, and boy are people freaking out.

Fortunately, I am here to bring the good news: AI will not destroy the world, and in fact may save it.

First, a short description of what AI is: The application of mathematics and software code to teach computers how to understand, synthesize, and generate knowledge in ways similar to how people do it. AI is a computer program like any other – it runs, takes input, processes, and generates output. AI’s output is useful across a wide range of fields, ranging from coding to medicine to law to the creative arts. It is owned by people and controlled by people, like any other technology.

A shorter description of what AI isn’t: Killer software and robots that will spring to life and decide to murder the human race or otherwise ruin everything, like you see in the movies.

An even shorter description of what AI could be: A way to make everything we care about better.

He paints a very positive picture, and one which (as a technologist) I cannot argue with on a point by point basis. It’s more about what he doesn’t address.

After characterizing the majority of those who advocate new restrictions, regulations and laws regarding AI as either “Baptists” (the true believers) or “Bootleggers” (the self-interested opportunists who seek regulation that insulate them from competitors), he attempts to take apart the AI risks that are most widely echoed in the popular press:
- Will AI Kill Us All?
- Will AI Ruin Our Society?
- Will AI Take All our Jobs?
- Will AI Lead to Crippling Inequality?
- Will AI Lead to Bad People Doing Bad Things?
And he finishes with the REAL risk of not pursuing AI with maximum force and speed: The threat of AI Supremacy by China.

His points as to what should be done with respect to AI sound, as you might expect, to be characterized primarily as:
- Let the Big AI companies go as fast as they can.
- Let the startup AI companies go as fast as THEY can.
- Let Open Source AI ‘compete with’ Big AI and the small ones as well.
- Offset the risk of bad people doing bad things by working partnerships of the private sector and government.
- Prevent the risk of China’s dominance in AI by using ‘the full power of our private sector, our scientific establishment, and our governments in concert to drive American and Western AI to absolute global dominance, including ultimately inside China itself. We win, they lose.’
Let me know how the perspectives and concerns of these three documents sit with you.
→ 6:02 PM, Jul 4

Data Lifecycle Management ( & Testing out Micro.blog)

Active metadata sends metadata back into every tool in the data stack, giving the humans of data context wherever and whenever they need it — inside the BI tool as they wonder what a metric actually means, inside Slack when someone sends the link to a data asset, inside the query editor as try to find the right column, and inside Jira as they create tickets for data engineers or analysts.

From: What Is Active Metadata, and Why Does It Matter?

→ 6:51 PM, Jan 25

What got my attention?

Big Tech and Governance

Professional Societies

The View from Sand Hill Road