How ChatGPT data poisons open-source models


Many open-source language models use ChatGPT output as training data. Why this might backfire.

In March, Stanford researchers unveiled the Alpaca language model, a 7-billion-parameter variant of Meta’s LLaMA trained with 52,000 instruction examples generated by GPT-3.5. In their work, the team showed that Alpaca significantly outperformed LLaMA in tests – the fine-tuning with ChatGPT examples was subsequently reproduced in many open-source projects as a kind of Alpaca formula.

Fine-tuning with such examples is intended to approximate the behavior of the underlying language model to that of OpenAI ChatGPT, and is also referred to as instruction tuning. Essentially, it is a form of supervised learning where a dataset contains, for example, questions with answers or a request to summarize a text with a separate summary. The goal is a helpful chatbot that makes as few mistakes as possible and knows when it is stuck.

OpenAI warns against simple instruction tuning.

But instruction tuning with ChatGPT examples can backfire, as OpenAI co-founder John Schulman recently argued. The company had large datasets created by humans for instruction tuning GPT-3.5 and GPT-4. So in the example of the summaries, they were written by humans. Alpaca, on the other hand, uses the summaries generated by GPT-3.5, thus avoiding the need for large human-generated datasets.


According to Schulman, however, this approach could significantly exacerbate the problem of hallucinations in open-source models. That’s because, according to the OpenAI researcher, hallucinations are often the result of training on data sets that contain knowledge that the original model didn’t have. With a simple question like “What’s the name of the Han Solo spin-off movie?” and the answer “Solo,” a model that already knows the answer will learn to give correct answers.

A model that does not know this answer will at best learn to reproduce this information – but at worst will give an answer regardless of whether it knows this information or not – in other words, it will hallucinate. Since it is not clear exactly what information is contained in a language model like LLaMA, a dataset generated by ChatGPT – that is, a dataset from a much larger model with more knowledge – may lead to thousands of examples where a model like Alpaca learns to give an answer even though it does not know the correct one.

Open Assistant shows a way out

According to Schulman, reinforcement learning with or without human feedback is one way to correct learned problematic behavior – but all currently available open-source models use only instructional tuning. OpenAssistant, however, is different: the project has collected its data with human volunteers and plans to add reinforcement learning to the models.

The human dataset generated by OpenAssistant also avoids another problem with the Alpaca formula: A language model that learns from ChatGPT will, in most cases, produce similar results, thus reproducing the quality constraints or biases of the source model. If the outputs of these models then permeate the Internet, as the outputs of ChatGPT or Bing already have, a kind of echo chamber could emerge in which OpenAI models and open-source models amplify their own errors and biases.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top