"Just" an LLM
"Just…" is a tricky word.
The idea of Artificial General Intelligence is - at least at the time of writing - that its something still outside of our grasp. Its the idea of what Artificial Intelligence will become - as opposed to the "Artificial Intelligence" that exists today.
We might say that ChatGPT isn't really AI; because its not that intelligent; its just a chatbot.
Now - that isn't really true, because there's a whole machine-learning, decision-making "brain" in there that makes it fundamentally different to, for example, the Virgin Media chatbot that I love to argue with, which is pretty limited in what it can do.1
"Inside" the ChatGPT app, there's a Large Language Model - the GPT-4o or GPT-o1, or whatever. The bit thats powered by machine-learning, huge amounts of training data etc.
graph TD A[You] -->|1: You send a query| B[Chat GPT] B -->|2: Query goes to LLM| C["LLM (eg. GPT-4o)"] C -->|3: Reply goes back to the user | A
The point is that the GPT LLM is just part of the ChatGPT App.
What a "pure" Large Language Model does is essentially a "next word generator." (Technically, a "next token generator".) The input is a whole conversation, and the LLM just adds a word. Then another. Then another. And that cycle continues until the LLM says that its done (think of "I'm finished" as a special token that tells the app that its done and ready for more user input) and hands the input back to the user.
graph TD A[User] -->|1: User sends a query| B[Chatbot App] B -->|2: Query goes to LLM| C["LLM (eg. GPT-4o)"] C -->|3: Generate a token and add it to the conversation| D["Conversation (with one additional token)"] D -->|4: Whole conversation is a new LLM input to generate the next token| C D -->|5: Update conversation in UI | B B -->|6: Conversation progress | A
(As a side note, I think its interesting that once the LLM has added a token to the conversation, there's no need for the next token to be added by "the same" LLM. At any time, the conversation could be moved to a different model, and the model wouldn't "know"- because the model doesn't "know" anything - its just an LLM, which is just a neural network, which is just a bunch of addition and multiplication…)
So, we might say that ChatGPT is just a Large Language Model… And there are things that a Large Language Model just can't do. (For example, an LLM can't really do maths.) The bit where the "thinking" seems to happen is entirely in the "LLM" part of the AI. Wrapping it up in an app that deals with the UI, login credentials etc. etc. isn't really very interesting.
For example; we know that LLMs can hallucinate. They can come up with words that make sense and follow patterns - but that don't have any grounding in reality. But they can also be made to hallucinate. We can manipulate their "view" of reality; tell them that they have said something that they didn't say, and they has no way of "knowing" that they aren't their own words.
Really, there's something fundamentally wrong with thinking of ChatGPT or an LLM as a singular "thing". I like the analogy of the game where a circle of people write a story by adding one word at a time; the group writes a story, but the "group" that is writing isn't one "thing". So as a collective noun, ChatGPT's pronoun should most definitely be they/them - because as its a plural, the third-person singular "it" isn't appropriate.
The grammatical difference is incredibly subtle though- for example, we should say something like "ChatGPT wrote this book themselves", rather than "ChatGPT wrote this book themself" or "itself". Thats just a grammatical illustration though - ChatGPT hasn't done any of the actual "writing" in any meaningful sense of the word, we shouldn't really say that either. Because even if they write it down, a group adding one word at a time to a story isn't really "writing". Call that group "a writer" feels wrong, at least in the sense that someone might say "I wish I was a writer", meaning something more than "I wish I could get paid to write whatever I wanted to write".99
Except… ChatGPT isn't just an LLM. Yes, there is a Large Language Model at its heart (specifically, GPT-4o, or GPT-o1 or whatever), but ChatGPT is an application. Its more than "just" a Large Language Model.
So - if you understand how an LLM is working, you could illustrate it like this;
graph TD A[User Query] -->|Send Query| B["LLM (eg GPT)"] B -->|Add a token to the prompt/response| C["WIP" Response] C -->|Send back to the LLM| B C -->|Send incomplete response | D[ChatGPT App] B -->|Tell app response is complete| D
One of the key components is a "RAG" - a system that essentially asks "do I have any information about this?" and can give responses, and those responses can inform the Large Language Model.
The process is something like this; the users query goes to the LLM as before - but the LLM (or the app) can then send a query to a knowledge base and retrieve information that is contextually relevant that helps inform its response.
graph TD A[User Query] -->|Send Query| B[LLM] B -->|Fetch Relevant Documents| C[Knowledge Base] C -->|Append Relevant Context to user prompt| B B -->|Generate Response| D[Response to User]
I think maths makes for a good example of the difference between what the LLM can do, and what the broader "LLM-powered app" can do.
If I ask you "what is 2x2", you can probably answer without thinking - "4".
Perhaps - if you paid attention in your primary school maths lessons - you can do the same for any n x n question– so long as n is 12 or less. Because, for reasons that escape me, its considered essential for everyone to know up to their 12 times table - but knowing a bakers' dozen of anything is not worth teaching anyone.
So - if I ask you what six sevens are, you'll probably reply "forty two" pretty quickly. Without really having to think about it.
But - can you explain how you know? Can you cite a source? No - its something you've learnt by rote; being repeatedly told that four sevens are twenty eight, five sevens are thirty five, six sevens are forty two. Sure - you can illustrate how to work out six sevens as proof - but thats validating your answer, not how you got there.f2 But if I ask you what seven thirteens are - you probably need to resort to mental arithmetic.
This is analogous to how an LLM "app" works; the LLM is trained on text data, equivalent to the 'learn by rote' approach to primary school times tables. And until relatively recently, asking the answer to an arithmetic question involving larger numbers (ie. that was unlikely to appear in the training data often enough to influence the model) would probably give you an answer that looks believable enough that you could think it was right - but probably wasn't. Because while LLMs are very good at remembering things they have been told, they aren't as good at basic arithmetic.
That isn't true any more. ChatGPT apparently now has the ability to do think kind of "thinking" in the background - it has been able to write Python code, but it can also run python code and get the response.
So, our "knowledge base" workflow illustration now looks something like this;
graph TB A[User Query] -->|Send Query| B[LLM] B -->|Write Python code| C[Python interpreter] C -->|Get output of Python as relevant context | B B -->|Fetch Relevant Documents| D[Knowledge Base] D -->|Append Relevant Context to user prompt| B B -->|Generate Response| E[Response to User]
How else might you expand an LLM application? At a super-simple level, you might have found that you can get "better" results from ChatGPT if you ask "are you sure about that?" when you get a response you think seems fishy. So - why not add something to the application that automatically asks "are you sure about that?" to every output?
Well - because it would be computationally expensive - that sort of thing might end up doing ten times the work that the $20/month "ChatGPT plus" is doing - and costing ten times as much to run, so you'd probably need something like a new subscription model.
LLMs can also be trained to do a better job at a more specific, narrowly-defined set of tasks. Meanwhile, LLMs in LLM-powered apps can talk to other LLMs. So, why not chain them together?
For example, you might have a "safety checking" LLM that identifies "unsafe" content (that is, "unsafe" for the LLM service provider, legally speaking.) Or a "copyright checking" LLM that identifies content that might be a copyright violation (again, legally unsafe for the LLM provider - however harmless it might be to ask ChatGPT for the lyrics to Partners In Kryme's "Turtle Power".3 ) Instead of your "general purpose" LLM writing code, you might farm that out to a dedicated code-writing LLM that can do a better job.
graph TB A[User Query] -->|Send Query| B[LLM] B -->|Write Python code| C[Python interpreter] C -->|Get output of Python as relevant context | B B -->|Fetch Relevant Documents| D[Knowledge Base] D -->|Append Relevant Context to user prompt| B B -->|Generate Response| E[Fact-checking LLM] E -->|Reject unsatisfactory responses|B E -->|Generate Response| F[Copyright-checking LLM] F -->|Reject unsatisfactory responses|E F -->|Generate Response| G[User]
There are dozens of other tools that could be bolted on - something that looks up results in a web search index, something that scans Wikipedia, or checks the content of YouTube videos, or finds relevant Tweets etc. etc. At this point, they would make the diagram more cluttered and confusing than illuminating - but hopefully you get the idea.
My point is that at some point, we can't say that this broader app - which still has a Large Language Model (or models) at its core - is "just" an LLM any more. It does things that "just" an LLM simply can't do. So, what do we call it?
Well - "AI" is one answer. Maybe this is the path to Artificial General Intelligence - personally, I don't think it is 5 ; no matter how much more useful or powerful it gets, there are some fundamental issues with LLMs that I don't think can really be addressed by bolting on more tools or feeding more training data (the same way that bolting on a calculator to a primary school child doesn't turn them into a maths wizard.)
That said - a primary school child with a calculator can, given the right instructions, be a very useful "agent" for a maths wizard - particularly one who, for whatever reason, isn't in a position to use the calculator themselves. (Lets say, a hypothetical maths wizard driving to present a paper and wanting to run some last-minute checks over some of their calculations…)
Which is kind of analagous to the "agentic generation" of AI that we're moving into - where an AI isn't just giving you information but taking actions based on that information. (Because there isn't a massive leap in sophistication from a system that can look up information online to a system that can click a "buy now" button.)
graph TB A[User Query] -->|Send Query| B[LLM] B -->|Write Python code| C[Python interpreter] C -->|Get output of Python as relevant context | B B -->|Fetch Relevant Documents| D[Knowledge Base] D -->|Append Relevant Context to user prompt| B B -->|Generate Response| E[Fact-checking LLM] E -->|Reject unsatisfactory responses|B E -->|Generate Response| F[Copyright-checking LLM] F -->|Reject unsatisfactory responses|E F -->|Generate Response| G[User] F -->|Give instructions| H[Agent] H -->|Confirm success| G
I suppose its the difference between "find me the best deal on an electric car with at least 200 miles range on a 4 year PCP deal" (ie. go and get some information, sift through it and give me information that answers my question) and "get me the best deal on an electric car with at least 200 miles range on a 4 year PCP deal, and prepare all the necessary paperwork for me to be able to drive it so that its fully insured etc." (ie. go and get some information, sift through it and process it in a way that answers my request; act on my behalf.)
The big question that raises (putting aside things like trust, reliability, accuracy etc. and focussing on the broader concept) is - why? What's the point? What sort of job do you care about little enough that you want to trust an AI to handle, but enough that you want to do something more than you're already doing?
If I'm signing up for a 4 year, £200+/month financial commitment then, personally, I want to be looking at the alternatives, spending time thinking about what matters to me. (For example, I personally wouldn't want to find out that I'm going to spend 4 years driving a white car I have to clean more often that doesn't have CarPlay, or regret skimping a couple of pounds a month and miss out on heated seats & steering wheel - but I might not realise that I cared about those until after I've spent some time reading reviews, looking at some photos, comparing features etc.)
On the other hand, if its a decision I'm not interested in spending time with - why not just go with the cheapest thing in the supermarket aisle/on Amazon/price comparison site etc.?
It feels to me like the big opportunity here is less about "what am I doing that I don't need to spend time on any more", and more about "what am I not doing that I didn't even know I could be doing, thanks to these new technologies?"
[Note - If you see blocks of text that begin with something like "
graph TD
", then a thing that should convert them to diagrams isn't working properly - hopefully refreshing the page will fix it. I think I've spent more time trying to get the diagrams to work than actually writing this post, so at this point I'm giving up, walking away and hoping for the best. Sorry; next time I'll probably just do screengrabs.]
-
Basically, your response to its questions selects a path the conversation can go down. Its a 'choose your own adventure' conversation tree - interactive, but everything the chatbot can say has been pre-written. ↩
-
Personally, I didn't pay as much attention in my primary school maths lessons. I worked out that instead of learning my times tables, I could just work it out in my head while doing the tests and save myself a load of time and work. Which meant my kids thought they were better at maths than me when they learnt their times tables. (This is probably the root of what I call my "professional laziness" strategy…) ↩
- Some months ago, I'd ask ChatGPT this question4 and it would start giving me the correct lyrics (other models like Llama or Watson would just make something up) - but then wipe it from the screen and say that it couldn't fulfil the request. Now, it just tells me straight away that it can't do it because of copyright. I guess thats progress? ↩
- Its an odd hobby, but I like exploring the edges of LLM/AI's knowledge and asking obscure questions that I know the answer to, but that I suspect would confuse an LLM. 1990s pop culture is an excellent source of this kind of thing - also, relatively obscure work by artists who are fairly prolific, but best known for one or two particular paintings. ↩
-
Sam Altman saysWe are now confident we know how to build AGI as we have traditionally understood it, on the back of releasing the GPT-o1 and GPT-o3 models. Which is to say that people who understand how AI works and how to build it may know better than me about this. ↩
-
That isn't to say that two people can't write a book together- but if Terry Pratchett and Steven Baxter were to write a sixth Long Earth book together by taking turns to write a word each without saying anything or communicating with each other in any other way… Well, I'd probably still end up reading it. (I may have lost my chain of thought here…) ↩