It's not that Steam's data is SUSPECT, it's just that it's a sample of a sample of the world market. Valve reported back in January that Steam has 120 million monthly users worldwide, which is a fairly representative sample size in most countries Steam is available in, but remember that not all users participate in the hardware survey, and in countries outside the USA, especially APAC countries such as China and Vietnam, internet cafés are much more prevalent than here in the USA, and that's something that Steam directly targets (Steam PC Café). It was reported last year that internet cafés were using their machines to mine cryptocurrency while they were shut due to Covid restrictions, so if a cafe gets 100 nVidia cards in so they can flog away at ether, those machines are still going to be running Steam for gaming as well, so the user share of nVidia increases.
The real test is going to be when Intel gaming GPUs start getting on the market...
The internet cafe multi-counting problem occurred a couple of years back and Steam fixed the problem. Outside of that, the data is suspect simply because Valve has never said how it samples. If you don't do a pure random sampling — for example, if you bias toward sampling "new" or "unknown" hardware, or even PCs where the hardware configuration has changed since the last sample — then it screws up the statistics in a very bad way. Steam doesn't disclose how many PCs were sampled each cycle either. Theoretically, it could be "all PCs connected to Steam," but then why actually ask for people to opt-in on the HW survey? So, it's a possibly not random sampling, and that's by far the biggest issue.
As for Steam not representing all users, that's less of an issue. Presumably there's very little correlation between AMD GPU owners not using Steam and Nvidia owners not using Steam — though Intel GPU users not using Steam wouldn't be as surprising (eg, a lot of business PCs don't ever run Steam). We're mostly interested in gaming GPUs used to play games, and Steam is by far the most popular gaming distribution service. If Steam China takes over for regular Steam in that country and its users no longer count toward the totals, that would be a potential problem, but AFAIK regular Steam still works in China. If anything, I'd expect China to skew more heavily toward Nvidia (it's a more respected brand there, again AFAIK), with or without Internet cafes.
I should note that the
overall percentages of the market, according to Steam's DX12 data, are 80.48% Nvidia, 14.81% AMD, 3.98% Intel, and then 0.73% "other." It's interesting that the Intel percentage is so low, as everything with UHD 500 or later should be DX12 compliant I think. I suppose most people with Skylake or later CPU aren't using the integrated graphics? Or maybe it needs to be Kaby Lake or later?
Another interesting fact: If you look at the Vulkan API numbers, things get
really screwy. Go ahead and add up the first 15 or so percentages in the Vulkan GPU list.
https://store.steampowered.com/hwsurvey/directx/ I'll help you out. For August, the first 15 GPUs in the Vulkan list sum up to 103.70%. Oops! In fact, the entire list of 384 GPUs sums up to 194.80%! Clearly something is fubar with the way Steam presents that data. The DirectX 12 GPUs, all 212 of them, sum up to 84.44% for August as well, which indicates a different sort of problem. I divided all of the numbers in the list by the total percentage in order to normalize things to 100%, but the data might still be wrong.
Fundamentally, Steam may collect data incorrectly (non-random sampling), and the API list doesn't properly sum up to 100%. We're looking at ALL GPUs that can run a specific API, so it should be 100%, and it's not. That's bad and reduces confidence in the underlying statistics. Basically, you can't trust the Steam Hardware Survey, even if it's the best publicly accessible set of data that we have.