A study found that just 250 poisoned documents were enough to corrupt AI models up to 13 billion parameters in size, showcasing the need for new kinds of defenses.
It turns out poisoning an AI doesn’t take an army of hackers—just a few hundred well-placed documents. A new study found that poisoning an AI model’s training data is far easier than expected—just 250 malicious documents can backdoor models of any size.
The researchers showed that these small-scale attacks worked on systems ranging from 600 million to 13 billion parameters, even when the models were trained on vastly more clean data.
The report, conducted by a consortium of researchers from Anthropic, the UK AI Security Institute, the Alan Turing Institute, OATML, University of Oxford, and ETH Zurich, challenged the long-held assumption that data poisoning depends on controlling a percentage of a model’s training set.
Instead, it found that the key factor is simply the number of poisoned documents added during training.
No specific quote available in the text
Author's summary: AI models can be corrupted with just 250 poisoned documents.