The magic of artificial intelligence (AI) infrastructure such as large language models (LLMs) is its ability to generate text that mirrors human conversation, but that ability isn’t magic it’s the result of studying humans’ capabilities. This is another way of saying LLMs are like any computer — they’re only as smart as the information fed to them. This article aims to shed light on the significance of inputs in LLMs and how these inputs mirror human abilities and challenges.
The Importance of Inputs in LLMs
Large language models are exactly that — models. Models analyze datasets to derive patterns and rules as a method of learning and replicating human intelligence. As you can probably guess, the dataset used in a model can dramatically alter its understanding. We’ve used a number of analogies to explain the significance of this, but it boils down to the same principle: the inputs in LLMs greatly influence the outputs.
Popular LLMs like Chat-GPT have access to a tremendous amount of data but this might not be the case for your organization’s preferred AI solution. When an LLM is trained on a massive trove of books, articles, websites, and other textual source it reduces the risk of creating blindspots in its understanding. This is the value of tools like Chat-GPT but — with current technology — having a massive dataset exposes the LLM to other challenges such as contradictory information or hallucinations. A more narrow dataset — like one based off of your organization’s internal knowledge — is less likely to have the challenges of a massive dataset. Narrower datasets have their own challenges. We’ve detailed the specifics of these challenges below.
1. The Principle of Input Quality for LLMs:
“Garbage in, garbage out,” is a time-honored phrase that has become more relevant today than ever before. No technology is more susceptible to “garbage” than LLMs because they rely on datasets to function at all. If the datasets are high quality, you’ll get high quality responses from your LLM. If the datasets are low quality, you’ll get low quality responses.
The nature of “quality” inputs can be complex challenges (as we’ll discuss throughout this blog), but oftentimes an organization’s primary quality challenge is implementing basic standards for their knowledge. A dataset with duplicates, out-of-date information, contradictory information, poor metadata, or a variety of other easily-resolved problems can weaken the quality of your dataset.
Any organization that works with clients has inevitably had the experience of saving a deliverable as “vFinal” only to receive additional direction from the client. This new direction necessitates a “vFinal2” then a “vFinal3”, maybe a “vFinal4” saved over the weekend, and eventually a “vPleaseLetThisBeFINAL”. Your human colleagues can likely understand the storytelling contained in your naming convention and figure out which document is the actual final version, but LLMs don’t necessarily have that ability. If you ask an LLM for the “final version” of the document, it may not be able to generate an answer. Worse, it may invent an answer based on an amalgamation of the various versions.
Quality content is low-hanging fruit. If your content has standards that avoid this basic challenge, there are still other considerations to keep in mind.
2. Ethical Implications of an LLM’s Dataset:
Datasets may contain biases that can be magnified when they’re used as training data for LLMs. These biases can be traditional ethical biases or they can contain general biases about the nature of the data.
For example, your organization may utilize an LLM to assist with hiring decisions. An element of this LLM’s role is to review hundreds of resumes and eliminate candidates who do not meet the basic necessities of the proposed role. If your dataset for a specific role indicates all qualified candidates have an age range of 40 and 45 years old, that could result in the LLM concluding candidates outside of this age range are innately unqualified. This could prevent promotions for junior employees, or unfairly disqualify candidates with more experience.
Alternatively, an LLM with a limited dataset may discover patterns that are irrelevant but are nonetheless integrated into its decision-making. Let’s reconsider the example of an LLM used for hiring decisions. If you’re hiring in a small company and only have roughly 20 examples of “qualified candidates,” then this can result in discovering patterns with little to no value. For example, you may have identified three employees considered the “highest performers” and all three happen to be born in March. With a larger dataset, the LLM may conclude this “pattern” is irrelevant, but with smaller datasets the LLM will learn whatever patterns appear.
These examples are simplified, but the principle remains. If the data fed to your LLM isn’t curated by developers and researchers to ensure high-quality training, you may run into ethical challenges. By being aware of potential biases and taking measures to address them, the outputs of your LLM can be equitable and unbiased.
3. How an LLM Understands Context:
Large language models can generate human-like text, but they face significant challenges when it comes to understanding context. This is because LLMs don’t have a true understanding of the world. What an LLM knows is based off the dataset it has access to. An LLM doesn’t have real-time access to information or a memory of past interactions beyond your current conversation with it (although some AI infrastructure could allow this functionality in the future). An LLM generates responses based on patterns learned during training and it can’t necessarily apply that information outside the scope of how it learned something in the first place.
For example, if you provide an LLM with a page of a book it doesn’t have the context of the rest of the book. It won’t know any characters, plot details, or themes unless they are mentioned in the provided text. If you provide a paragraph that refers to character named John — but John doesn’t appear in that actual paragraph — the LLM won’t know who John is, his role in the story, or any other details that exist beyond the supplied paragraph. This lack of context can lead to responses that may seem nonsensical or out of place.
You can extrapolate this problem to any instance where the LLM is asked to provide a response to a question outside what it’s learned. The LLM might make assumptions or guesses based on the limited information it has, but these might not align with the context of the problem you’re attempting to resolve.
4. Collaboration between Humans and an LLM:
Humans and LLMs have at least one thing in common: they’re both shaped by what they consume. As much as we focus on LLMs reliance on their analysis of datasets, it is also true humans are influenced by their personal experience with the world. This similarity is an opportunity to leverage collaboration between the two. Human supervision and input can guide and refine the outputs of LLMs to ensure a balance between automation and human judgment.
The inputs used to train models on human understanding and intelligence follows a common colloquialism: you are what you eat. The quality, accuracy, and biases of the output for LLMs relies on if they’ve been fed quality, accurate, and biased data. Comprehending this relationship is crucial for developers, researchers, and users of LLMs to ensure responsible and ethical use. The influence of this dynamic cannot be understated. Large language models are only as good as the data they are trained on. Embracing this understanding can pave the way towards a future where LLMs make positive contributions to society, guided by responsible and informed practices.