A simple example of Zen4 HX - it is already more than 2 years out of date (but it is still the fastest mobile processor in the world in the x86 camp - 7945HX). It has 28 5.0 lanes (24 are free). The video card is connected via 4.0 x8 most often, at best x16, i.e. it is only 8 5.0 lanes. M.2 5.0 x4 slots are pointless in laptops - too high power consumption of controllers and heating of nand chips with disgusting air draft around M.2 slots - a priori incorrect architectural planning of the case and motherboards of laptops, plus the impossibility of installing effective large-sized radiators (or there will be increased noise).
I'm not sure why you're focusing on laptops at all, since desktops are clearly the consumer platform with the most advanced PCIe implementations. FWIW, Ryzen 9950X can achieve 77.7 GB/s of real memory (read) bandwidth, using DDR5-6000 memory.
In fact, Zen4 HX uses less than half of the 24 available lanes in 5.0 mode. In addition, 28 5.0 lanes are more than 100GB/s, if all at the same time
First, it can only run 24 lanes in PCIe 5.0 mode. The nominal, unidirectional data rate of that would indeed be ~96 GB/s. Add in the x4 chipset link running at 4.0 speed and you get 104 GB/s. However, this is a fictitious use case that basically never happens in the real world. I'll agree that PCIe 5.0 doesn't make a whole lot of sense in a consumer platform, right now.
As for lane count vs. memory bandwidth, I once had an AMD 890FX board with
42 PCIe 2.0 lanes! That's ~21 GB/s of aggregate (unidir) bandwidth, which is beyond the real memory bandwidth of that platform and certainly beyond the HyperTransport 3.0 link, which I think is limited to less than 17.6 GB/s (unidir). Just to put another data point out there.
And what do we see? Zen5 Halo for the first time in consumer x86 gets a 256-bit controller, which in reality should have been in Zen4 HX 2 years ago, which would allow you to easily service all 28 lanes of 5.0 with a reserve for OS/software - they also need to work ...
That's a laptop chip, though. It's really not designed to undertake heavy I/O loads. You should instead focus on Threadripper or Epyc.
To service 7.0, you need RAM with a bandwidth of 1TB/s, and only HBM3 controllers with a 1024-bit bus provide this.
Even at 28 lanes, it's still just 448 GB/s (unidir), but just because PCIe is a switched fabric doesn't mean systems are (or even should be) designed to support the max theoretical aggregate throughput. Especially in consumer devices, I/O tends to be very bursty and only one device is usually bursting at a time.
5.0 SSDs are not needed by anyone in the consumer segment at all, even in the gaming segment
I generally agree that they're overkill, while only offering marginal benefits in loading time on some of the slowest-loading games. However, new controllers should help with power consumption:
Here's a review of a drive using that SM2508 controller:
Peak power is still rather high, at 9.13 W, but the average (50 GB folder copy test) is just 4.98 W, which I think puts it towards the upper end of the range of PCIe 4.0 drives.
For video cards, a thicker bus matters if the RAM is approximately (empirically) about 3 times faster than this channel. That is, if you want to download data to vram dgpu at a speed of 128GB/s, i.e. pcie 6.0 x16 (5090 has 1700+ GB/s VRAM bandwidth), you will need a 350GB/s memory bus at least for general tasks, so that everything runs smoothly and efficiently in parallel with dgpu.
I'd love to see some data supporting this assertion. I think it's pretty far off the mark.