Common Terminology for Large Language Models

LLM (Large Language Model)

Simple Definition: A powerful artificial intelligence program that can generate human like text.
More Detail: LLMs are a type of artificial intelligence trained on massive amounts of text data. They learn to predict the next word in a sentence, and this allows them to communicate, write different kinds of creative content, translate languages, write code and more. At their core, LLMs are statistical systems built on neural network architectures, often using the Transformer architecture. They are trained on massive text datasets in an unsupervised or semi-supervised manner, learning the complex probabilistic relationships between words and sequences. This gives them the ability to generate text, translate, write different kinds of creative content, and answer your questions in an informative way.
Examples: Popular LLMs include GPT-4 (powers ChatGPT and Microsoft Copilot), Gemini (Google).
Limitations: LLMs can sometimes produce incorrect or misleading text.

Prompt

Simple Definition - In the context of Language Models like GPT (Generative Pre-trained Transformer), a prompt is essentially a piece of text you give to the model as input. This text acts as a cue or instruction, telling the model what you want it to do. Based on this input, the model then generates text that follows or responds to the prompt. For example, if you input "Write a poem about the sea," the prompt is the instruction for the model to compose a poem with the sea as its subject.
More Detail - A prompt in the context of Large Language Models (LLMs) like GPT serves not just as a basic instruction but as a critical component that shapes the model's output. It's a multifaceted tool that can guide the model in various tasks such as text generation, question answering, summarization, translation, and more. The complexity and structure of a prompt can significantly influence the quality and relevance of the model's output.

Prompts can be "zero-shot", where the model receives no prior examples and must understand the task from the prompt alone; "one-shot", where the model is given a single example to help it understand the task; or "few-shot", where multiple examples are provided within the prompt to guide the model's output more explicitly.
In more advanced uses, prompts can include instructions that leverage the model's internal knowledge or encourage it to adopt a specific tone, style, or format in its response. This can involve crafting prompts that are more suggestive, embedding hidden instructions, or using specific keywords that the model has been trained to recognize as cues for generating certain types of content.
`Moreover, the art of "prompt engineering" has emerged as a crucial skill, where practitioners meticulously design prompts to maximize the effectiveness and accuracy of the model's outputs. This involves understanding the nuances of how LLMs interpret text and leveraging that understanding to construct prompts that are clear, concise, and aligned with the desired outcome.
Through these intricate prompt structures, users can harness the vast knowledge and capabilities of LLMs, directing them to perform a wide array of tasks with surprising depth and nuance.

System Prompt

Simple Definition - A system prompt in the context of LLMs refers to the specific instructions or inputs that a system (like an application or software using an LLM) sends to the language model. These prompts are designed to tell the model what kind of response or output is expected. Unlike prompts that might be entered manually by a user, system prompts are often predefined or dynamically generated by the software to achieve consistent and specific tasks, such as generating reports, answering queries, or interacting in a conversational manner with users.
More Detail - A system prompt goes beyond basic input by incorporating elements of software design, user interaction, and task-specific objectives into the interactions with LLMs. These prompts are typically part of a larger system or application that utilizes the LLM for various functionalities. The design of system prompts is crucial because it influences how effectively the model can understand and respond to the requirements of the task at hand.

System prompts can be static (predefined) or dynamic (generated in real-time based on context, user input, or specific conditions). They are crafted with the understanding that the quality of the input directly affects the quality of the output. Therefore, system prompts often include not just the task instructions but also context, formatting guidelines, and even examples to guide the model's response.
In advanced implementations, system prompts might also involve layers of logic or filtering to refine the model's output before presenting it to the end-user. This could include post-processing steps to ensure the response meets certain quality or relevance criteria, or incorporating feedback loops where the system learns from the interactions to improve future prompts and responses.
Moreover, in environments where LLMs are integrated into user-facing applications, system prompts play a critical role in shaping the user experience. They must be designed to translate user needs and commands into a format that the model can understand and respond to effectively, bridging the gap between natural human language and the model's capabilities.
In the realm of "prompt engineering" within system design, the focus is on optimizing these interactions to leverage the full potential of LLMs, ensuring that the system can handle a wide range of tasks, from simple data retrieval to complex problem-solving, in a manner that feels intuitive and seamless to the user.

Token

Simple Definition: A basic unit of text that an LLM processes.
More Detail: Tokens aren't always full words. They can be parts of words, individual letters, common punctuation, or even special symbols that the LLM needs to understand. Tokenization is the process by which LLMs break input text into meaningful units. There are different tokenization methods:

Word-level: Full words form the basic units.
Subword-level: Common prefixes, suffixes, or even parts of words become tokens (helpful for languages with many compound words or handling rare vocabulary).
Byte-Pair Encoding (BPE): A hybrid approach striking a balance between word and character-level tokenization.

Example: The sentence "The dog barked loudly" might be broken into tokens like: "The", "dog", "bark", "-ed", "loud", "-ly"
Limitations: How an LLM breaks down text into tokens affects how it understands the input, so it's an important part of the process.

Context Window

Simple Definition: The amount of text an LLM "remembers" when generating a response.
More Detail: Like humans need context to understand a conversation, LLMs look at a window of previous text to determine how to respond. The previous text can include previous messages in a conversation or context that has been retrieved from a document or website. The size of the context window is a crucial constraint to understand for LLMs. However, context windows have rapidly increased from about 8k tokens in early 2023, to now 128k in 2024 for GPT-4 and 1 million tokens for Google Gemini 1.5. These context windows will keep growing in the future, allowing LLMs to generate responses while being able to reference more and more information. For example, several books worth of text could be held in a 1 million token context window. It's generally limited by computational constraints.
Example: If the context window is set to 500 words, and you ask a question about a paragraph you wrote earlier, the LLM only "sees" those previous 500 words for reference.
Limitations: Limited context windows can lead to misunderstanding if important information is further back in the text.

Hallucination

Simple Definition: When an LLM confidently produces text that is factually wrong.
More Detail: Hallucinations are a problem with LLMs. They learn patterns in language, but don't have genuine understanding, so they sometimes make things up that sound believable but aren't true. Hallucinations stem from a few factors:

Distribution Bias: The massive text datasets LLMs are trained on have their own biases and errors, which the LLM might reproduce.
Reward Hacking: LLMs are often trained to produce text that seems plausible, not necessarily factual. This can lead to confident-sounding but incorrect output.
Lack of Grounding: Without real-world interaction, it's difficult for LLMs to develop a concrete understanding of factual truth.

Example: If you ask "What is the capital of Australia?" and the LLM responds "Melbourne", it's hallucinating. (The correct answer is Canberra).
Limitations: Hallucinations are tough to counter, and it reminds us to always fact-check an LLM's output.

Retrieval Augmented Generation (RAG)

Simple Definition: A technique that lets an LLM search for and use information from outside sources.
More Detail: Imagine an LLM attached to a search engine. RAG lets the LLM look up relevant information from the web, documents (PDFs, word docs, code) and incorporate it into its responses, helping to reduce hallucinations. RAG models typically consist of two core components

Retriever: Responsible for finding relevant documents or passages from a knowledge base (this could be Wikipedia, a database of research papers, etc.).
Generator: The LLM processes retrieved information and formulates the final response.

Key Challenges: Making the retriever efficient (finding the right info quickly) and teaching the generator to use retrieved knowledge effectively are ongoing areas of research.
Example: If you ask "How many people live in Canada?", with RAG, the LLM finds a reliable website with population figures instead of guessing.
Limitations: RAG still depends on the quality of search results and the LLM's ability to understand and use the information it finds.