I'd say they just work as designed. And that most surprises just stem from the fact that some people still don't understand how they are designed and operating.
These LLMs extrapolate on your input using data they were trained on. If their training data contained the relevant information and if that was actually correct in the first place, there is a good chance it will match what we believe to be the truth.
But it doesn't fall back into training mode to see if any more up-to-date information changes its prediction, because that would be far too expensive computationally, while it couldn't really keep fact and fiction separated either.
If currently sometimes it will correctly state that a certain iphone doesn't yet exist, that is very likely one of those hard coded overrides that the maintainers of these models are putting in to curb hallucinations or any other responses human audiences might consider aberrant.
But those moderator teams evidently aren't nearly as creative as the editors on this site, so they haven't nailed down overrides for every potential future product IT giants might deliver. I am pretty sure they generate data on iphone 42 internally, before the override takes hold.
I imagine these moderator overrides to be somewhat like classical expert system rules and there is a good chance that the noted quality loss in GPTx repsonses is actually due to these expert rules interfering badly with the quality of their extrapolation, in cases where that extrapolation was actually better.
People just shouldn't forget that AIs are artificial first and somewhat more intelligent than steam machines after. And intelligence literally just means that you fill the gaps, 'legere' means putting or laying down in Latin and "inter" means in-between.
Most of the time that is also just extrapolation, sometimes it's mindblowing abstractions and theories on general relativity. But don't expect the ability to develop such theories any time soon, I think it's a better match to think of current LLMs as the ability to extrapolate from all of Wikipedia (+Github, +some more) in real-time answers that are similar.
Factuality is hopefully a high correlation on the inputs the current AI "brains" have no capacity to do a "Gedankenexperiment" on their output before they utter it, as most humans would do beyond infancy.
Small children are much more likely to "hallucinate" in a similar way and we find that charming, sometimes even inspiring, while we work hard to override it with "common sense" and "fact checking".
AIs would have to gain that ability to operate in domains where today they clearly are out of their depth, but I'm also glad that the barriers to achieve that are probably unsurmountable for years to come.
And there is a good chance they will never be overcome, because AIs don't earn salaries, do not spend money or pay taxes. As humans get pushed out of the loop, progress comes to a standstill for lack of economic incentives.