News Researchers Chart Alarming Decline in ChatGPT Response Quality

I wrote a few months ago that AI gets worse over time, because it has to resort to less and less reliable data to train on.

Machine learning needs so much data because it has no idea what it's reading. The Chinese room argument can easily prove this, and also proves how flawed the Turing test is.

Because of the complete lack of intelligence involved in machine learning, it can not interpret, and has to go through tons of data before it will react with any consistency.

Eventually it will unconsciously start to train on preexisting AI data and you get a similar effect to that game you played as a kid, where 1 person is told to tell the next person a sentence, and then the next person, etc...you'll know it's a very unreliable way to transmit data because bits and pieces get left out or added. That's what AI eventually does, and it becomes incredibly unreliable.

Few seem to be aware that AI gets worse over time. Researchers won't tell you because they will lose their funding, nor will hardware companies.

An article on the Atlantic finally spoke about it.

"The language model they tested, too, completely broke down. The program at first fluently finished a sentence about English Gothic architecture, but after nine generations of learning from AI-generated data, it responded to the same prompt by spewing gibberish: “architecture. In addition to being home to some of the world’s largest populations of black @-@ tailed jackrabbits, white @-@ tailed jackrabbits, blue @-@ tailed jackrabbits, red @-@ tailed jackrabbits, yellow @-.” For a machine to create a functional map of a language and its meanings, it must plot every possible word, regardless of how common it is. “In language, you have to model the distribution of all possible words that may make up a sentence,” Papernot said. “Because there is a failure [to do so] over multiple generations of models, it converges to outputting nonsensical sequences.”

In other words, the programs could only spit back out a meaningless average—like a cassette that, after being copied enough times on a tape deck, sounds like static. As the science-fiction author Ted Chiang has written, if ChatGPT is a condensed version of the internet, akin to how a JPEG file compresses a photograph, then training future chatbots on ChatGPT’s output is “the digital equivalent of repeatedly making photocopies of photocopies in the old days. The image quality only gets worse.”
 
Last edited:
I wonder how much impact various data scraping countermeasures are having on GPT and its AI siblings. I know a lot of artists who have put subtle noise filters on their art and some angry developers who spread unrelated or bogus code on question answer sites. A lot of people said no to having their data or contributions taken for AI without attribution or compensation. Many have been working on fighting back.

From a certain point of view, AI can simultaneously be improving -- as in not becoming "dumber" -- while also providing worse outcomes resulting from sucking up inferior data.
 
I wrote a few months ago that AI gets worse over time, because it has to resort to less and less reliable data to train on.
I don't think that's the issue here, unless they've silently introduced a ChatGPT 4.5 or something, the same model should be able to answer questions about prime number identifications with similar accuracy over time. It's more likely that this is a side effect of the guardrails that they keep adding on top of the model to keep the moralists and censors of our own Dark Ages satisfied.
 
  • Like
Reactions: vehekos
The intern in my office vehemently, strongly defends his use of ChatGPT.
(banned in the office, but he uses it for a LOT of other things)

Taking a Biology class, and for the final, has to write a paper.
One of the office denizens asked..."So you're going to use the ChatGPT?"
'Absolutely! 1000%!'

"So, you're not learning anything about biology, just learning how to ask the questions to ChatGPT, to produce a semi crappy answer."

'Yes, but I still get credit for the class'

The rest of us laughed.
 
The joys of generalistic AI: you train it on a presumably curated known-good data set and when a new iteration of the AI regresses between generations, you have no clue why because it is all just billions parameters in matrices where most parameters have no specific link to any single thing.
 
The only time I tried chatgpt it couldn't add numbers correctly.. I mean really, the most simple thing. But of course it's dmb, It was created being fed with left wing chicken donuts to push the agenda, what did you guys expect
 
  • Like
Reactions: RichardtST
I wrote a few months ago that AI gets worse over time, because it has to resort to less and less reliable data to train on.
That doesn't explain why ChatGPT 3.5 improved substantially, over the same period. I'll bet you just saw the headline and assumed it confirmed your preconceptions, without actually reading the article.

I don't know what's going on, but I can think of a few possibilities. Most likely, whatever is behind the decrease in GPT-4 is fixable.

Machine learning needs so much data because it has no idea what it's reading.
Consider how much a kid has to read, in order to become educated. You wouldn't say the same about them, would you?

Because of the complete lack of intelligence involved in machine learning,
How long are you going to keep spreading this misinformation?

Eventually it will unconsciously start to train on preexisting AI data and you get a similar effect to that game you played as a kid, where 1 person is told to tell the next person a sentence, and then the next person, etc...you'll know it's a very unreliable way to transmit data because bits and pieces get left out or added. That's what AI eventually does, and it becomes incredibly unreliable.
It's virtually impossible that its training data got dominated by AI-generated content, already.

Even in the story they just did on that, it takes several iterations of regurgitation before you start to see the AI's output noticeably degrade, and that involved training on 100% AI-generated content, rather than a more realistic scenario of it comprising a fairly small percentage.


It's the return of Lysenkoism. When it is forced to adhere to a political agenda, is forced to abandon reason.
That's not how it works, but cool story.
 
Last edited:
I wonder how much impact various data scraping countermeasures are having on GPT and its AI siblings.
There could be some impact with large sources of training data placing their content off-limits for training. Like, if Wikipedia suddenly banned the use of its site for training, then you'd expect chatbots to get noticeably dumber.

I know a lot of artists who have put subtle noise filters on their art and some angry developers who spread unrelated or bogus code on question answer sites. A lot of people said no to having their data or contributions taken for AI without attribution or compensation. Many have been working on fighting back.
Given that this is measuring mostly performance on textual problems, visual art is irrelevant.

As for people spreading bogus answers, I'm sure there aren't nearly enough of them to make a measurable impact. There's tons of live open source code that you can't simply insert bugs into, which can be used for training data.
 
The joys of generalistic AI: you train it on a presumably curated known-good data set and when a new iteration of the AI regresses between generations, you have no clue why because it is all just billions parameters in matrices where most parameters have no specific link to any single thing.
It's probably a lot like debugging why a new compiler version causes a performance regression. In many cases, you're not going to find the reason by picking through the compiler output, but instead by looking at how different changes to its optimizer interact and behave on different inputs.
 
Not surprising at all. It is well known that the more engineers you throw at software, the worse it gets. It has to implode and get rewritten from scratch by someone who actually understands it and has learned from the previous version.
 
It's probably a lot like debugging why a new compiler version causes a performance regression. In many cases, you're not going to find the reason by picking through the compiler output, but instead by looking at how different changes to its optimizer interact and behave on different inputs.
With a compiler, you still know exactly what changed, profiling your regression test suite should point you pretty close to what got broken and you can start working on the fix(es) in a deterministic manner from there.

With AI training, there is no clean partitioning of what affects what and how. Fixes consist of re-arranging the training data set, maybe tweaking the model a bit and hoping it doesn't fail in new and horrible ways beyond the tiny sliver of possible failures ChatGPT regression tests might cover.
 
With a compiler, you still know exactly what changed, profiling your regression test suite should point you pretty close to what got broken and you can start working on the fix(es) in a deterministic manner from there.
You underestimate the amount of changes which typically go into a compiler release. Performance regressions can happen as a result of the interaction between two or more changes. It's not necessarily the case that you can isolate it down to one thing, nor does it necessarily have a definitive "fix", since the changes were probably done to improve something else that would regress if you reverted them.

With AI training, there is no clean partitioning of what affects what and how. Fixes consist of re-arranging the training data set, maybe tweaking the model a bit and hoping it doesn't fail in new and horrible ways beyond the tiny sliver of possible failures ChatGPT regression tests might cover.
It's not the sort of crapshoot like you're making it out to be. If you have enough data for the number of parameters you're training, then accuracy tends to be more predictable. You would also have a test set that you use to assess its accuracy, and that should be about 10% the size of the training set, with similarly broad coverage. You actually need the test set, to help you know when the training is converging.

Since you seem to have a fair amount of time on your hands, I'd encourage you to take a decent course on it:
 
  • Like
Reactions: TJ Hooker
I can't be the only one thinking...did somebody accidently switch them? Seriously. __it happens
That would explain only some of the changes, but there's clearly more going on.

DJJskkiSdADukyKL8Ajpbi.jpg