Lesson 2: What is a Large Language Model?
- The Definition: LLMs as a specific instance of Foundation Models.
- The Input: Understanding Tokens vs. Words.
- The Engine: The Transformer architecture and the Attention Mechanism.
- The Process: How Next-Token Prediction creates human-like text.
- Refinement: The critical step of Fine-Tuning for business use.
We know GPT stands for Generative Pre-trained Transformer. But to use these tools effectively, we need to look under the hood. An LLM is not a "knowledge base" or a "database" of facts; it is a probabilistic engine designed to predict patterns.
1. Defining the LLM
A Large Language Model (LLM) is a specific type of Foundation Model trained on text and code.
- Foundation Model: A model trained on broad data (the internet, books, academic papers) that can be adapted to many downstream tasks.
- Large: This refers to two things:
- Data Scale: Trained on petabytes of text. To visualize this: if 1GB holds ~178 million words, modern LLMs consume millions of gigabytes.
- Parameter Count: GPT-3 has 175 billion parameters. Think of a parameter as an internal "knob" or setting. During training, the model adjusts these billions of knobs to minimize errors. More parameters generally correlate with higher reasoning capabilities.
2. The Input: It Doesn't Read "Words"
Humans read words. Computers read numbers. To bridge this gap, LLMs use Tokenization.
- Tokens: Text is broken down into chunks of characters called tokens.
- The Math: Roughly, 1,000 tokens ≈ 750 words.
- Why it matters: When we discuss "Context Windows" (how much an AI can remember) or "Pricing" (how much an API costs), we always measure in tokens, not words.
3. The Architecture: The Transformer
The breakthrough that made modern AI possible is the Transformer architecture (the "T" in GPT). Its superpower is the Self-Attention Mechanism.
Before Transformers
Older AI read sentences sequentially (left to right). If a sentence was long, the AI would "forget" the beginning by the time it reached the end.
The Attention Mechanism
Transformers process the entire sequence at once. They assign a "weight" (importance) to every relationship between tokens.
- Example: In the sentence "The animal didn't cross the street because it was too tired," the model knows that "it" refers to the "animal," not the "street."
- Result: This allows the model to maintain context over long conversations and understand nuance, sarcasm, and complex instructions.
4. The Training: The "Fancy Auto-Complete"
At its core, an LLM is trained to do one simple task: Predict the Next Token.
- Input: "The sky is..."
- Calculation: The model looks at its 175 billion parameters and calculates the probability of every possible next word in its vocabulary.
- Selection: It picks the highest probability token (e.g., "Blue").
- Feedback: During training, if it guessed "Green," it is mathematically "punished" (Loss Function) and adjusts its parameters. If it guessed "Blue," it is "rewarded."
By repeating this process trillions of times, the model learns not just grammar, but logic, reasoning, and facts about the world—all as a byproduct of learning to predict the next word.
5. Refining the Model: Fine-Tuning
A base model (fresh out of training) is like a brilliant but unguided encyclopedia. If you ask it "How do I bake a cake?", it might respond with "And how do I bake a pie?" because it thinks it is completing a list of questions.
To make it useful for business, we perform Fine-Tuning:
- Instruction Tuning: We train the model on datasets of Questions and Answers. This teaches the model to follow instructions rather than just complete text.
- Domain Adaptation: We can further fine-tune a model on medical data, legal contracts, or your company's specific coding standards to make it an expert in a narrow field.
6. Business Applications
Understanding this architecture helps us identify where LLMs add value:
- Synthesis: Because they understand global context (Attention), they are excellent at summarizing messy unstructured data (emails, meeting notes).
- Transformation: They can reliably translate intent (e.g., turning a user request "Show me sales in Q3" into a SQL database query).
- Generation: They can draft high-volume content (product descriptions, personalized outreach) that requires human-like variability.