Am I a large language model?
Salasso is the italian word for bloodletting, the withdrawal of blood from a patient to prevent or cure illness and disease. Although abandoned by virtually all of modern medicine as being overlwemingly harmful to patients, the word salasso is still used as a metaphor for very signifcant expenses, draining a person of their funds as leeches would have drained them of their blood.
The use of salasso as a metaphor is not common but, for some reason, it has always been a part of my family’s vocabulary; I do not remember ever learning the word and I assume I must have picked it up and started using it at a very young age. However, even though I was aware of the long-gone medical practice of bloodletting, I only found out we had a word for it throughout my twenties (and not without a certain amount of embarassment). Up until then, I had always understood salasso as being just another noun for “significant expense”.
In principle, the way I picked up and started using the word salasso seems to me to be entirely similar to the process that allows today’s large language models to approximate our use of language to such an incredible degree. As a child I was never aware of the word’s literal meaning, possessing no facts about the word itself. As a good pattern recognition machine, however, I must have picked up on the fact that salasso would repeatedly appear in contexts that implied the presence of a significant expense and in which salasso was the only word that could carry that very meaning.
Models such as the GPTs (ChatGPT, GPT-4, …) do not know anything about the words that they use but they’re still able to concatenate them following the rules of language. In a way, they appear to establish connections between words not unlike how I was able to connect salasso to its metaphorical meaning… For the entirety of human language. For those who would like to know more about these models, I have found Stephen Wolfram’s What is ChatGPT doing and why does it work? and Confused bit’s How does GPT work? to be excellent entrypoints into one of the deepest rabbit holes I have peeked into.
Our common vocabulary is ill-equipped to describe and talk about large language models. Words like know, think, infer, hallucinate are generally understood to imply a level of conscious thought and understanding that, so far, appear to remain absent from these models.
On one side, using these words casts a light onto these models that is entirely too human, at least for the time being. Most of us agree that the behaviors we are observing do not classify as thinking in the same way that we think and that a model cannot be wrong in the same way that we can. What should we call these behaviors, then? How can we ever expect our children, our elders and our non-technical people to understand the difference between ourselves and these models, at least insofar as a difference continues to exist, unless we develop an appropriate vocabulary for it?
I think a good starting point might be the concept of prediction, making the comparison with atmospheric models used for weather forecasts. Both of these kinds of models make predictions, albeit in two different domains: the domain of climate sciences and the domain of language. Still, nobody believes that an atmospheric model may ever give rise to thinking, even if every single one of its predictions were to prove completely accurate. Why should we treat large language models differently? The ability to make accurate predictions is not indicative of the potential for thinking and understanding.
On the other side, however, I am finding it more and more difficult to argue against the fact that I, too, am very likely to be a large language model of my own, deployed in a bipedal, dexterous, biological machine that can form and persist memories, in a community of similarly equipped machines. A model that might have emerged by sheer chance but then quickly proved itself to be the greatest evolutionary advantage that life had ever manifested.
Perhaps the question should not be whether I am one such model or not but whether there’s anything more to me beyond a large language model and, if so, what that might be.