AI Inbreeding - Distorted Results from AI Trained on AI.
Can GPTs get sick? The Iterative Nature of AI and the Dangers of Model Collapse via Feedback Loops
The process of training AI models, particularly generative ones, often involves iterative refinement. This means the model is repeatedly exposed to new data and adjusted to improve its performance. While this approach can lead to significant advancements, it also introduces the risk of model collapse, a phenomenon where the model's output becomes increasingly distorted and unusable.
To improve their generative AIs, OpenAI and other developers need ever more high-quality training data — but now that publishers know their content is being used to train AIs, they’ve started requesting money for it and, in some cases, suing developers for using it without permission.
A Concrete Example: The Game of Telephone
To understand how model collapse can occur, consider the game of Telephone - what we used to call 'Chinese Whispers'. In this game, a message is whispered from one person to another, and then passed along a chain of people.
As the message travels, it often becomes distorted due to misunderstandings, mispronunciations, or simple human error. For instance, a message starting as "The quick brown fox jumps over the lazy dog" might end up as "The brown frog jumped into a swampy bog."
In a similar way, AI models can become corrupted through iterative training. When an AI generates content, it may introduce minor errors or inaccuracies. If this content is then used to further train the model, those errors can be amplified and incorporated into subsequent generations of output.
Over time, the model's understanding of the original data can become so distorted that it produces completely nonsensical or irrelevant results.
In some instances it may even become dangerous, if corrupted by other training data.
The Dangers of Overreliance on Synthetic Data
The study discussed earlier highlights the particular dangers of relying too heavily on synthetic data for training AI models. Synthetic data is generated by the model itself, so it is inherently subject to the model's current limitations and biases.
As the model is trained on this synthetic data, it can become trapped in a feedback loop, reinforcing its own errors and inaccuracies.
For instance, imagine an AI tasked with generating summaries of news articles. Initially, the AI might produce accurate summaries, but over time, it may start to introduce biases or misunderstandings. If the AI is then trained on its own generated summaries, it may reinforce these errors, leading to a spiral of increasing inaccuracy.
Mitigating Model Collapse
So can GPTs get sick? - The short answer is Yes. But to prevent model collapse, AI developers must be mindful of the risks associated with iterative training and synthetic data.
Strategies to mitigate this problem include:
* Diversifying training data:
Incorporating a variety of real-world data sources can help to counteract the negative effects of synthetic data and prevent the model from becoming overly reliant on its own generated content. For example, an AI trained on a diverse database of text, including news articles, novels, and scientific papers, is less likely to develop biases or misunderstandings.
* Regular evaluation:
Periodically evaluating the model's performance on a held-out dataset can help to identify signs of model collapse and allow for corrective measures to be taken. This involves setting aside a portion of the training data for evaluation purposes and regularly assessing the model's ability to generate accurate and relevant output.
* Human oversight:
Human experts can play a crucial role in ensuring that AI models remain aligned with their intended goals and avoid producing harmful or misleading content. By providing feedback and guidance, human experts can help to prevent the model from becoming trapped in negative feedback loops and ensure that its output remains relevant and valuable.
By carefully considering these factors, AI developers can help to minimize the risks of model collapse and ensure that generative AI models continue to produce valuable and reliable outputs.
It's not all doom and gloom with AI.
If you would like guidance on using AI in business, to save time and effort, contact us at Digital Advantage - digitaladvantage.me
Comments