News AMD’s beastly ‘Strix Halo’ Ryzen AI Max+ debuts with radical new memory tech to feed RDNA 3.5 graphics and Zen 5 CPU cores

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
I have personally seen many times how SATA ports are limited on a slow DMI bus.
Because it was originally PCIe 1.0 x4. Sandybridge upgraded it to PCIe 2.0. Skylake bumped it up to PCIe 3.0. Then, Comet Lake widened it to x8. Finally, Rocket Lake upgraded it to PCIe 4.0, where it's stayed since then. So, we're talking about up to 16x the performance of whatever you saw.

It is quite obvious that even the latest version of DMI is several times slower than the total number of lanes available directly from the processor.
Let's take a step back and see if you can find some data to show that DMI is currently a bottleneck in modern systems (e.g. LGA1700).

Zen4 HX series has 28 PCIe 5.0 lanes - 24 are available. Compare with the thickness of the south bridge...
Oops, but you're talking about DMI. That's Intel, not AMD.

The funniest thing is that none of the laptop manufacturers have used all these Zen4 HX processor lanes, literally hanging in the air without doing anything.
It's easy to find them all in use. A good dGPU uses x16 lanes and two CPU-connected M.2 slots use the other 8 lanes. The last 4 go to the chipset.


the Zen4 HX memory bus is extremely weak - only 60-65GB/s, and although all devices require at least 2 times more, and taking into account the reserve for the system and software - 3 times more, if not 4,
Where do you get this information?

and here we are smoothly approaching the 256-bit Zen5 Halo controller with probably 200GB/s. Bingo! Eureka!
We already established the CPU cores can't get more than 196 GB/s, at the absolute theoretical max, based on the speed of their IF links.

There is no point in me citing these graphs - they are obvious and trivial. The point was that the memory bus is as fast as the L1 cache, being located next to the processor, like the soldered memory (remember the context of our conversation). Therefore, any cache is a crutch.
I'm not sure what you're saying, here. I will say that you cannot build a fast CPU that doesn't have cache, no matter how much memory bandwidth it has. Period. If you don't understand why, then you need to learn more about computer architecture and software performance optimization before you should try being an armchair computer architect.

It is obvious that Intel will also be forced to switch to a 256-bit (or 512-bit) controller in the HX series for the HEDT market, a year late compared to AMD,
Huh? Xeon W ships in two flavors: one with a 256-bit memory interface and one with a 512-bit memory interface. I don't even understand your reference to AMD, because Threadripper had 4+ DIMM channels since its inception.

if Halo provides real 200 GB / s + - it will go into absolute lead over the HX series in terms of intensive processing of heavy data arrays in memory. And naturally, this will affect the performance in games, which such series usually target.
I already showed you a memory scaling article that shows games are much more sensitive to memory latency than bandwidth.

84271_untitled-4.png


Source: https://www.guru3d.com/review/ddr5-...e-14/#performance-gaming-rtx-3090-witcher-iii
In this case, DDR4-3600 outperformed DDR5-4800 by 1%, even though its raw bandwidth is 25% lower! However, we can start to see why, when we compare CAS latency. For the DDR4-3600, it's only 10 ns, while the DDR5-4800 is 16.7 ns.

As for DDR5-7200 was a mere 3.4% faster than DDR5-4800, in spite of its raw bandwidth being 50% greater!! The main reason for that is probably because the DDR5-7200 has 9.4 ns CAS latency, while the DDR5-4800 memory has 16.7 ns CAS latency.

Purely empirical assessment based on my understanding of the problems of x86 architecture. Especially in terms of igpu blocks and output to high-resolution screens with high frame rates.
I'm pretty sure you mean "intuitively". Empirical usually means it's based on data, which you need to start providing to backup your assertions, because none of them have been supported by the data I've cited.

I can't add anything except to repeat - such a scheme deprives the architecture of the universality of the memory bus for all devices equally and leads to bottlenecks for certain classes of calculations
Nobody is going to disagree that more memory bandwidth is better. The critical question is how much. That's why I've cited memory scaling data which refutes the claim that it's bottlenecking at the level you think it is.

I hope that the 256-bit Zen5 Halo controller will give as before at least 80%+ efficiency for the CPU cores, but at the same time will dynamically efficiently distribute the bandwidth between all devices according to their requirements, unlike the limitations of the Apple architecture.
Again, we've already established that each CCD can do a max of 64 GB/s read + 32 GB/s write.

That's it, unless AMD enables the second IF link per CCD, which I'm sure they won't because it's a laptop processor and that would use more power for little benefit.

Intel directly recommends in the datasheets for video decoders and when outputting 4k to use only 2-channel memory. Why, if 22GB/s+ (DDR4 3200+) is more than enough even for a pair of 8k@60 monitors? But in reality, their IGPUs begin to freeze screens already with 4k monitors on single-channel memory, these are proven facts, especially in the case of old DDR4 3200.
The display controller's FIFOs are only so big and display streaming is a hard-realtime problem. So, I guess in lieu of having a robust QoS mechanism, they just want you to supply enough excess bandwidth that the display controller can always meet its deadlines.

It is much better if sys mem = vram and the processor cores access vram directly without restrictions of the pci-e bus.
In practice, they don't. Because PCIe has such high latency, graphics APIs are designed to avoid the CPU doing a lot of poking around in VRAM.

the frame buffer (it is small in size even with 3 buffering) can be implemented separately on igpu, so as not to interfere with the common memory bus and common data processing by processor cores and gpu cores.
It's a waste of die real estate to put the frame buffer there, because writes to it are relatively infrequent relative to other memory I/O that happens in a rendering pipeline. That's why nobody does it, any more (Microsoft/XBox used to have a thing for local memories, but even they stopped doing this). Instead, what Nvidia and AMD is just use big caches and then they speed up whatever memory I/O you're doing. AMD got a huge speedup from doing this in RDNA2 and Nvidia followed in the RTX 4000 generation.

to be convinced of my rightness, you just need to compare the text on the screen of your smartphone, and then on the screen of your monitor with less than 150 ppi.
I don't like reading on my smartphone.

The main problem for the eyes is that when it looks at a low ppi screen, it constantly refocuses from pixels to the objects themselves.
I don't know how close you sit to your monitor, but I don't really see pixels on my 27" 1440p. I like to sit with my eyes around 24" to 30" away.

You confirm what I said. Yes, 4k reduced by a bicubic algorithm to 2.5k will naturally look more or less, because 4k is excessive for 2.5k, as well as for fhd. But ideally, because this is not a division by an integer.
There are better low-pass filters than that. Bicubic is just what people did when computers were too slow to handle large convolutions. If you use a proper low-pass filter, then you don't need an integral division. This is something I'd only recommend for video content, however.

Only 8k, 4k, fhd are universal - all three are obtained either by multiplying lower resolutions by an integer or dividing higher ones into lower ones again by an integer.
Heh, where do you think 2560x1440 came from? It's a 2x scaling of 1280x720, which is one of the ATSC resolutions and supported by Blu-ray and many video cameras. So, I wouldn't say there's no 1440p content out there.

in commercial video, only 4:2:0 color thinning scheme is used
In professional video, they use 4:4:4. Support for this, in consumer products, is less common, but not unheard of.

That's why I don't understand why you keep a completely inferior 2.5k at home.
I mostly use it for text or web. Hardly ever streaming video.
 
Last edited:
Most buyers on the planet are ignorant and do not understand what 8k (or more precisely 280+ ppi) on a screen up to 32" means to them. And at the same time they can easily compare the screen of a smartphone and the screen of their monitor. The most amazing thing is that even you (an extremely experienced member of this forum with many years of experience) do not understand this, judging by your statements. Although to be convinced of my rightness, you just need to compare the text on the screen of your smartphone, and then on the screen of your monitor with less than 150 ppi.
Do you sit within 12" of your monitor or hold your phone out at arms length when you use it?

Are you using the same physical font sizes on both?

This is an astoundingly idiotic comparison attempt doubly so since you're framing it as people being ignorant.

The reality is that unless you're doing something that specifically needs high PPI 100-140 is plenty for the vast majority of people when considering typical monitor viewing distance.
I have personally seen many times how SATA ports are limited on a slow DMI bus.
On older systems (think Nehelam and older) there was a reason you didn't want to be stuck with a third party SATA controller. If it was on a newer system you've experienced a bad implementation which didn't give enough bandwidth to the SATA ports. This would the have literally nothing to do with the DMI lanes at all.
It is quite obvious that even the latest version of DMI is several times slower than the total number of lanes available directly from the processor.
The only time the limitations of the DMI lanes comes into play is if you're using more bandwidth than it can provide. If you're copying between two chipset connected PCIe 4.0 NVMe drives for example the chances of that going full speed is somewhat low because of everything that connects to the chipset and overhead, but if one's off the chipset and other is off the CPU it should be maximum transfer rate. This of course will vary based on DMI since Intel's Z (and certain H) chipsets have 8 lanes but B (and certain H) have 4 so if you're on the latter issues can arise.

AMD has a whole different set of issues due to the whole two chipsets thing with AM5.
The question is - why did AMD make these 28 lanes in Zen4 HX? Obviously, just to show what it can do, although in real implementation, no one needed them due to the extremely slow memory bus and the lack of PCI-E 5.0 devices in laptops. They just created an artificial effect of "coolness" of this series, which no one will be able to use in practice. And apparently to "wipe the nose" of Intel with Raptor HX...
Contrary to your ridiculous hypothesizing they did so because AMD's HX CPUs are simply binned desktop CPUs just like Intel's.
For a simple reason - the Zen4 HX memory bus is extremely weak - only 60-65GB/s, and although all devices require at least 2 times more, and taking into account the reserve for the system and software - 3 times more, if not 4, and here we are smoothly approaching the 256-bit Zen5 Halo controller with probably 200GB/s. Bingo! Eureka!
The only reason Strix Halo has a 256-bit memory bus is the iGPU. As dismal as SODIMM memory bandwidth tends to be the latency is typically still lower. So while the memory bandwidth is a lot lower an HX part with a dGPU would still outperform a hypothetical Strix Halo with dGPU when it comes to gaming. There are undoubtedly some niche cases which can use more CPU memory bandwidth, but that's not what products are designed around. I say this being someone who has wanted 2DPC client parts to die since DDR5 launched and 256-bit memory bus to become standard.
 
  • Like
Reactions: bit_user
Because it was originally PCIe 1.0 x4. Sandybridge upgraded it to PCIe 2.0. Skylake bumped it up to PCIe 3.0. Then, Comet Lake widened it to x8. Finally, Rocket Lake upgraded it to PCIe 4.0, where it's stayed since then. So, we're talking about up to 16x the performance of whatever you saw.
Now compare the requirements of all ports that actually hang on the DMI and the thickness of this bus.

Let's take a step back and see if you can find some data to show that DMI is currently a bottleneck in modern systems (e.g. LGA1700).
B660 has 4.0x4 - and it is on it that the second M.2 is hung in real motherboards. Now count - how can you really copy information at full speed between this SSD and the processor, when there is also a 2.5 Gbit/s network and a bunch of USB ports, not counting something else?

All buyers of motherboards complain about the same thing - SSDs connected to the south bridge never show the same performance as those connected to the processor lines. Do you need more proof of the obvious lack of DMI bandwidth?

In the HEDT segment with 690 - already 2 x M.2 and the same problem - lack of DMI 4.0x8 bandwidth. Moreover, these boards already have 5-10 Gbit/s RJ45 and much faster USB ports. Let me remind you - the HX series does not have a built-in TB4 controller. It has to be connected to the PCI-E bus and most often through the south bridge.

There have already been many comparisons in the press of eGPU operation on notebooks with HX series processors and, for example, P series - the latter are faster - because TB4 is connected directly to the processor, and not via DMI+pcie.

Oops, but you're talking about DMI. That's Intel, not AMD.
It doesn't matter - there is also a bottleneck with the south bridge.

It's easy to find them all in use. A good dGPU uses x16 lanes and two CPU-connected M.2 slots use the other 8 lanes. The last 4 go to the chipset.
You are wrong. dGPU in laptops are always connected at x8. Moreover, even the desktop 4090 does not use the 5.0 bus, i.e. 24*2=48 lanes in 4.0 mode with normal multiplexing. Which again (funny) was not done. And it was not done, because the overall memory controller bandwidth is shameful for both AMD and Intel, although Intel is always slightly faster in terms of the memory controller - by 15-20%. Which only exacerbates AMD's problems with such a number of available 5.0/4.0 lanes.

Where do you get this information?
Real-life benchmarks for Zen4 laptops.
We already established the CPU cores can't get more than 196 GB/s
A link to the source that the 256-bit Halo controller has this limit?

Xeon W ships in two flavors: one with a 256-bit memory interface and one with a 512-bit memory interface. I don't even understand your reference to AMD, because Threadripper had 4+ DIMM channels since its inception.
Why are you so deliberately distorting the context? We are talking about consumer series, including HEDT. We are not interested in the server market - forget about it. In the consumer platform market - AMD has suddenly pulled ahead, with 2x faster memory in 2025. Intel does not have it in ArrowLake and will not have it until 2026 for sure.

I already showed you a memory scaling article that shows games are much more sensitive to memory latency than bandwidth.
Nobody is going to disagree that more memory bandwidth is better. The critical question is how much. That's why I've cited memory scaling data which refutes the claim that it's bottlenecking at the level you think it is.
In Photoshop, filter processing speeds up almost linearly with memory bandwidth growth. Here's a practical example. In 2-channel mode, processing is up to 90% faster. In games, the impact is much smaller. But I'm not really interested in games, but in work tasks where the speedup is significant. Coders, decoders, code compilation, etc., are a small part of the tasks where the benefit will be immediately visible. Gamers are a small part of the market.

Again, we've already established that each CCD can do a max of 64 GB/s read + 32 GB/s write.
Again, where is a reliable reference that there will be such a limitation in Halo?

The main reason for that is probably because the DDR5-7200 has 9.4 ns CAS latency, while the DDR5-4800 memory has 16.7 ns CAS latency.
You are probably mistaken - higher frequency memory is always slower in random access speed. This is the Achilles heel of DRAM in the current architecture. Compare with SSD - there, albeit slowly, the speed of 4K IOPs grows along with the linear speed. In DRAM, the process is reverse (regressive): with each iteration, latency grows, which is clearly visible in hundreds of reviews on the Internet and even at home on faster PCs. This is a dead end. After all, you correctly pointed out - the response of the system and software directly depends on the random access to RAM. What is the point of super-fast memory if latency has grown from 50ns to 120ns in laptops in 10 years?

Just look at the shame of the L2-L3 caches - Intel has significantly slower latency (it was 10ns for Haswell, now it's 20ns for Raptor - AIDA64 Mem&Cache benchmark) than AMD, but there's also gradual regression there.
In practice, they don't. Because PCIe has such high latency, graphics APIs are designed to avoid the CPU doing a lot of poking around in VRAM.
Another crutch again. We are moving on crutches! That means we are all disabled already...

It's a waste of die real estate to put the frame buffer there
The point is that igpu should only have its own frame buffer for double or triple frame buffering. Actually, Intel did it this way before - 128MB of VRAM on igpu - everything else in shared memory.

The same thing now - 1GB is enough for a frame buffer - which is enough for 4-6 8k@36bit screens.

But the shared memory between igpu and cpu cores should be directly accessible - bypassing the pci-e bus. This is especially sad if this is not the case, when the RAM is located nearby in the chiplet, and not in slots on the motherboard. There is no problem doing this. At the same time, it can easily be rebalanced in any way for different purposes in size between the processor cores and igpu.

I hope that in Lunar Lake, where the RAM is in the same tile, it is done exactly this way.

I don't like reading on my smartphone.
Again, you are distorting and going out of context - it's not about whether you like reading on your smartphone (if it's on AM (OLED) I understand everything right away - because it's a PROBLEM for the eyes in 100% of cases due to the unsafe PWM flickering frequency of 60-240 Hz, in new ones up to 480 Hz, but that's not enough), the point is that on a smartphone with the same font size on the screen (it's just that little text will fit there with such a font size), you will instantly feel relief compared to your 2.5k screen on 27 "(and especially 32") - because the lenses of the eyes do not need to constantly get confused by spontaneous refocusing between the pixel structure of objects and the objects themselves. And the further you sit from the screen - the smaller the working area in terms of the number of objects - because the eye has a maximum angular resolution. I'll give a simple example on a smartphone - display a 2x3 tile of 1920x1080 images (or webcams, for example - the dynamics are even better there) at the same time and compare how it looks on the same piece of your 27" screen - there you won't be able to see a damn thing in detail even from 25-35 cm, but on a smartphone with 400 ppi+ you will see many times more details from 25-30 cm in each window out of 6. Because you simply don't see the pixel structure on a smartphone, no matter how close you bring your eyes to the screen spontaneously, by accident. This same "seamless" vision should be for laptop and monitor screens. At distances of up to 20 cm. And only with about 400 ppi do we get a practically analog picture everywhere on all gadgets. And this is the end of progress in terms of increasing resolution. Maybe that's why they're also dragging it out for so many years? What will they sell to people after 8k monitors? They will have nothing to sell further. This is the end point of IT evolution in terms of screens that are looked at. Then we will have to develop some other aspects of the picture - color depth, response speed (but it is already sufficient on AM (OLED) and microLED), black level, color accuracy. Volume/3D, etc. But not resolution - because the eye will no longer see any differences, at any viewing distance, the eyes will rest from the visible pixelation of the picture, i.e. its digital structure.

The eyes rest only when there is no unsafe flickering (as on all AM (OLED) today), the contrast is sufficient and the pixel structure is completely invisible, the picture is analog.

There are better low-pass filters than that. Bicubic is just what people did when computers were too slow to handle large convolutions. If you use a proper low-pass filter, then you don't need an integral division. This is something I'd only recommend for video content, however.
No interpolation should be applied when changing the resolution in a multiple (by an integer) direction! This is absurd. And it is this terrible feature of image corruption when changing the resolution in multiples that is for some reason embedded in the firmware of all monitors and laptop panels! Is this some kind of conspiracy! Why?! What prevents you from simply outputting the same information to a 2x2 pixel matrix when switching from 4k to fhd, getting 1 fhd pixel? Nothing. There are no hardware or software limitations, which Intel and AMD have proven since 2019. But this is not their task - this is the task of the panel controller firmware! I still don’t understand this, as soon as mass 4k+ monitors appeared more than 10 years ago...

Heh, where do you think 2560x1440 came from? It's a 2x scaling of 1280x720, which is one of the ATSC resolutions and supported by Blu-ray and many video cameras. So, I wouldn't say there's no 1440p content out there.
You're distorting things again! Today, no one needs 720p content, but 1080p continues to be used everywhere and has been mainstream for over 10 years. And that's why only 4K and 8K screens are compatible with it, but not 2.5K. Why did they make 2.5K for games? The hardware just couldn't handle 4K! But why was it necessary to make 2.5K? Why couldn't they have been installing only beautiful 4K panels in laptops and monitors for a long time, and playing FHD when there aren't enough frames? After all, everything will be perfectly sharp there, just like with video. By the way, the interpixel distance on 4K panels is much smaller than on old FHD panels, which means that even visually the picture in FHD mode on a 4K panel will be more monolithic for the eyes on the same diagonal. Do you understand this? There is no point in 2.5K screens today except for the stinginess of manufacturers.

They are deliberately blocking progress, although with an increase in the production of 4K panels for the same laptops (to be guaranteed to get only 250ppi+ on all series, even on 18"), prices will drop significantly. And they are not very expensive anyway - around $150-160. And just recently, before 2021, literally before the growth of inflation in Western countries due to the endless printing of money, they were sold for $110-120 at retail. The price for mass (hundreds of millions) deliveries will even now fall below $100. Look how SSD prices are falling...

In professional video, they use 4:4:4. Support for this, in consumer products, is less common, but not unheard of.
I'm not interested in studio master copies. Are you aware that even cinemas show releases in DCP in 4:2:2 mode, but not 4:4:4, although with a much higher bitrate (for 4k copies) than 4k BD? In reality, almost all hardware (and software players by default, unless special settings are made in some versions) until recently, even with a signal source from which you can make 4:4:4 in fhd resolution on the fly, stupidly output the picture to fhd TV, projectors, monitors and laptop screens exclusively in 4:2:0, even with 4k 4:2:0 source (all commercial and most home videos). This is stupid - a significant part of the color information and resolution in the frame is lost on fhd screens.

I mostly use it for text or web. Hardly ever streaming video.
For text, even at your viewing distance, to completely eliminate the refocusing effect on the discrete structure of the text, you just need 220+ ppi and higher. And your monitor is simply not compatible with the fhd content that prevails - you can't divide 2560 on 1920 by integer.

Where do you think your eyes will find it easier to read text and especially complex characters, especially with proper anti-aliasing in grayscale (not like bad version in Chrome), and not like here in color (aka ClearType which I hate):
dpi_font-gif.972083

On the left (conditionally) is your 2.5k 27", on the right is 5k 27" (conditionally). Imagine how great the picture would look on 8k 27". From any viewing distance...
 
Be sure to look at my screenshots at 400% magnification in full size only!

I will specifically give an example of how bad anti-aliasing looks in Chrome (and browsers based on it, Chromium) on screens with low ppi. These shadows lead to damage to people's vision at ppi below about 220-230, but of course closer to 300 is better:
5iT9ooS.png

Now the same fragment of text from the comment by bit_user at the beginning of page 3 of this topic, in the Firefox browser up to and including version 68 (using additional settings), with the correct anti-aliasing in shades of gray (as in XP by default with ClearType disabled) - the best option for the eyes - they rest, because the text is easy to read:
cWNYwlk.png

Now here's what I have to deal with on screens below 220 ppi to avoid damaging my eyesight - I disable the bad blurry antialiasing in Firefox (but in Chrome it's not possible on Windows, so I don't use it anymore) using a setting, this is what it looks like and it's much better for the eyes than the modern antialiasing in Chrome, Firefox and Edge, which results in blurry fonts:
BdMk2sJ.png

I have a rhetorical question for a long time - why does Google ruin the eyesight of all Windows users with such blurry fonts, doing this clearly intentionally for 10 years, although they are 100% aware of this problem (there are many such examples on their bug tracker)? Under any version of OS.. This effect of incorrectly generated vertical shadows will not be at high ppi, they will simply become invisible. That is why, even if such an effect is present on smartphone screens, it is not visible there. But not on monitor and laptop screens with low ppi. And fhd screens in monitors and laptops are a large part of ownership even today. And even 2.5k does not solve this problem on laptops (but 4k practically solves it), but on monitors even 4k does not solve it. 8k is needed, or finally if someone brings the management of the Chrome development department to reason (and Firefox, since Google pays them money to exist). Microsoft and Edge are a separate topic ...
 
Now compare the requirements of all ports that actually hang on the DMI and the thickness of this bus.

B660 has 4.0x4 - and it is on it that the second M.2 is hung in real motherboards. Now count - how can you really copy information at full speed between this SSD and the processor, when there is also a 2.5 Gbit/s network and a bunch of USB ports, not counting something else?
In the real world, there will generally be at most 2 devices that are using a significant amount of their theoretical bandwidth at a time, and that's pretty much either because you're doing a drive-to-drive copy or reading some huge amount of data from a pair of SSDs that are stripped in a RAID 0. So, Intel's latest DMI has plenty of bandwidth to handle such use cases.

The situation where all devices are running at peak bandwidth, simultaneously pretty much never happens in real life.

If you're that stressed out that an I/O bottleneck might occur, then just buy a Xeon W or a Threadripper. They have plenty of I/O bandwidth.

All buyers of motherboards complain about the same thing - SSDs connected to the south bridge never show the same performance as those connected to the processor lines. Do you need more proof of the obvious lack of DMI bandwidth?
Post a link to such complaints. Or, better yet, some quantitative data. So far, all of your claims have been entirely unsubstantiated.

There have already been many comparisons in the press of eGPU operation on notebooks with HX series processors and, for example, P series - the latter are faster - because TB4 is connected directly to the processor, and not via DMI+pcie.
I would like to see the data from those comparisons, for myself. Would you please post a link to one?

You are wrong. dGPU in laptops are always connected at x8.
That's not true. Some of the laptop CPUs only have an x8 connection for them, but the better laptop dGPUs can and do get connected at x16.

Moreover, even the desktop 4090 does not use the 5.0 bus,
Yes, but why are you mentioning that?

i.e. 24*2=48 lanes in 4.0 mode with normal multiplexing. Which again (funny) was not done.
I don't even know what you're talking about, here. PCIe supports bifurcation and switching, but not multiplexing.

Real-life benchmarks for Zen4 laptops.
Provide links.

A link to the source that the 256-bit Halo controller has this limit?
It uses the same CCDs as the 9950X and they have the same Infinity Fabric bandwidth, which you can see in this post:

As Peksha helpfully pointed out, in the following post, the consequence of that is the limit I said on how fast the CCDs can communicate with the I/O die.

Why are you so deliberately distorting the context? We are talking about consumer series, including HEDT.
HEDT means CPUs like Threadrippers and the big-socket CPUs from Intel, which are currently sold only under Xeon W branding. If that's not what you mean, then the word you want is something other than HEDT (High-End DeskTop).

In Photoshop, filter processing speeds up almost linearly with memory bandwidth growth. Here's a practical example. In 2-channel mode, processing is up to 90% faster.
I need to know your source for this claim. Please provide a link.

in work tasks where the speedup is significant. Coders, decoders, code compilation, etc., are a small part of the tasks where the benefit will be immediately visible.
Please provide this data.

I already posted two different memory scaling benchmarks (one for games and one for rendering) which show very little benefit from much higher-bandwidth memory, and it's even possible that most of the benefit came from lower latency. If you have data which shows otherwise, I'd like to see it for myself.

You are probably mistaken - higher frequency memory is always slower in random access speed.
In the article I linked, you can see the CAS timings for yourself. If you understand what that means, you can convert it to nanoseconds and check my arithmetic.

In DRAM, the process is reverse (regressive): with each iteration, latency grows,
Early DDR5 memory had higher latency, which is why it was outperformed by slower DDR5 memory, in that benchmark. However, newer DDR5 memory has narrowed or even closed the latency gap vs. DDR4.

What is the point of super-fast memory if latency has grown from 50ns to 120ns in laptops in 10 years?
A lot of laptop memory (LPDDR) is slower because they multiplex commands and addresses over the same bus. That puts it at an artificial disadvantage, because you're introducing factors not inherent to the DRAM technology.

Just look at the shame of the L2-L3 caches - Intel has significantly slower latency (it was 10ns for Haswell, now it's 20ns for Raptor - AIDA64 Mem&Cache benchmark) than AMD, but there's also gradual regression there.
Well, I found some cache latency data for both Haswell and an Alder Lake P-core, which is nearly the same as Raptor Lake. Interestingly, Alder Lake has slightly lower latency at all levels in the cache and memory hierarchy.
Now, if you had a source link, then we could see if maybe what they actually measured in Raptor Lake was perhaps an E-core? Without a source link, there's no way of knowing why this data directly contradicts your recollection.

The point is that igpu should only have its own frame buffer for double or triple frame buffering.
Do the math. How many times do you think each framebuffer pixel gets written, during the process of rendering a frame. Add that up. Multiply it by the framerate. Now, how much bandwidth does that save you vs. writing it to system memory? You'll find that it's not worth it.

Also, because the framebuffer is mostly written, this is only speeding up writes which the GPU is largely insulated from. When it issues a write, the transaction goes into a queue and the GPU can continue on with whatever it's doing (unless the queue fills up). However, when it does a read and the data isn't nearby, then the potential exists for it to stall (i.e. if there's not enough other work to keep in busy until the read completes). So, it's much more important to reduce read latency, and that's something caches can do (as well as helping to buffer writes).

Again, you are distorting and going out of context - it's not about whether you like reading on your smartphone
I don't think so. My point was that to benefit from the high DPI of my phone, I'd have to hold it so close to my face and I don't want to read that way. Having such high DPI in my monitor would be a waste, because my face is never that close to it.

That said, if you want a high-DPI display, that's great. It's just that I don't feel the same, nor do I think enough people to create a large demand for such products. What's worse about high-DPI displays than the cost is that any graphics rendered on them will tax existing GPUs, which is another reason I didn't want a 4k display for my home use.

No interpolation should be applied when changing the resolution in a multiple (by an integer) direction! This is absurd.
It's common to do this for photo and video content. For line graphics, text, etc. I know most interpolation looks worse than pixel-doubling.

literally before the growth of inflation in Western countries due to the endless printing of money,
That's not what drove inflation. Yes, doing that will cause inflation, but it's not the only way inflation happens.

Look how SSD prices are falling...
SSD prices fall due to NAND density increases. By die area, the NAND chips don't actually get cheaper, but through 3D scaling and increasing the number of bits per cell, they can drive down the cost of capacity faster than the price of the silicon increases.

Are you aware that even cinemas show releases in DCP in 4:2:2 mode, but not 4:4:4, although with a much higher bitrate (for 4k copies) than 4k BD?
When viewing, the chroma sampling rate doesn't matter, because your eye has poor chroma resolution. The reason 4:4:4 is favored in commercial film & video production is that certain processing they typically do can result in artifacts from low-resolution chroma.

As for the bitrate, I heard that's because they use MJPEG or maybe JPEG2000 for the codec, which is like stone age tech, compared to H.265 (HEVC).

In reality, almost all hardware (and software players by default, unless special settings are made in some versions) until recently, even with a signal source from which you can make 4:4:4 in fhd resolution on the fly, stupidly output the picture to fhd TV, projectors, monitors and laptop screens exclusively in 4:2:0, even with 4k 4:2:0 source (all commercial and most home videos). This is stupid - a significant part of the color information and resolution in the frame is lost on fhd screens.
Some blu-ray players and other devices will let you lock the output into RGB mode, which forces 4:4:4. HDMI mandates that all compliant devices support RGB mode. So, the display will support it and it's just a question of getting your source to force it.
 
Last edited:
I would like to see the data from those comparisons, for myself. Would you please post a link to one?
1260P beats 12900HX with 4090

Post a link to such complaints. Or, better yet, some quantitative data. So far, all of your claims have been entirely unsubstantiated.
These are a lot of reviews on purchased SSDs even in large stores/retail chains. Every time someone complains about the performance of the SSD PCI-E 4.0 x4, it later turns out that the buyer unknowingly connected and tested the disk in the M.2 slot on the south bridge, and not connected to the processor. There are too many of them and I did not specifically collect statistics. These are just facts known to me in the plural. And they did not even have any load besides the SSD on this bridge, well, except maybe the keyboard and mouse and an idle 1-2.5 Gbit/s network at home. But even in such conditions, everything was significantly worse in tests than with a slot connected directly to the processor.

It's easy to find them all in use. A good dGPU uses x16 lanes and two CPU-connected M.2 slots use the other 8 lanes. The last 4 go to the chipset.
Specifically, in which models is x16 bus used for dgpu? I don't see such information. In the mass case it is only 4.0 x8 for both AMD and Intel. How funny is that. when the slot itself is already 5.0...

I don't even know what you're talking about, here. PCIe supports bifurcation and switching, but not multiplexing.
Bingo! That's right! This is exactly what I wrote about - no multiplexing and free division into the number of lines - wasted transistor and thermal budget. For what and why?

You wrote earlier that HX for laptops is a rejection of desktop chips. Everything is exactly the opposite - since in notebooks, consumption is critical - the best crystals with lower voltage and frequency are in the notebook version, and all desktop crystals are precisely a rejection of notebook ones that did not pass the consumption test (minimum required voltage). It is strange that you do not understand this.

As Peksha helpfully pointed out, in the following post, the consequence of that is the limit I said on how fast the CCDs can communicate with the I/O die
If this is true, then this is another AMD fail, like with 24 lines of 5.0 in the HX series, which are completely useless in laptops in 2023 and 2024. Despite the fact that the 7945HX still remains the king of performance, unsurpassed by anyone.

I need to know your source for this claim. Please provide a link.
Is it so difficult for you to pull out the memory module and run tests in Photoshop yourself in single-channel mode, comparing the results in different processing filters? I'm too lazy to look for links. They were in many reviews in the past - nothing has changed since then. It's just a notch in my memory.

You can use the AIDA64 Photoworkxx test - it also changes linearly from bandwidth, like Photoshop.

However, newer DDR5 memory has narrowed or even closed the latency gap vs. DDR4.
But not for DDR3 and especially not for DDR2...=)

A lot of laptop memory (LPDDR) is slower because they multiplex commands and addresses over the same bus. That puts it at an artificial disadvantage, because you're introducing factors not inherent to the DRAM technology.
We are talking about DDR4/5 64x2, unsoldered 4x32bit.

Well, I found some cache latency data for both Haswell and an Alder Lake P-core, which is nearly the same as Raptor Lake. Interestingly, Alder Lake has slightly lower latency at all levels in the cache and memory hierarchy.
The latency test in AIDA64 of all versions shows almost a 2-fold (80%+) difference in L3 cache latency in favor of AMD between competing H(X) lines. Now AMD has also become somewhat worse - it has grown by about 30-40%.
And a little less with L2, but Intel is also not doing well - cache latency is growing, although this is not RAM...

Do the math. How many times do you think each framebuffer pixel gets written, during the process of rendering a frame. Add that up. Multiply it by the framerate. Now, how much bandwidth does that save you vs. writing it to system memory? You'll find that it's not worth it.
If it were so, Intel wouldn't put a small amount of vram on its older igpus - what's the point, except for speeding up operations with the frame buffer, which will not affect the performance of the main memory. This "window" is only filled with data from RAM and the igpu forgets about it until the next need to update the screen content - while the system memory is 100% unloaded from transmitting information to the display panel controller.

I don't think so. My point was that to benefit from the high DPI of my phone, I'd have to hold it so close to my face and I don't want to read that way. Having such high DPI in my monitor would be a waste, because my face is never that close to it.

That said, if you want a high-DPI display, that's great. It's just that I don't feel the same, nor do I think enough people to create a large demand for such products. What's worse about high-DPI displays than the cost is that any graphics rendered on them will tax existing GPUs, which is another reason I didn't want a 4k display for my home use.
It's hard for me to argue with a person who denies reality - a picture with a higher ppi is sharper, which means it's better for human vision (and less brain fatigue, since it is constantly engaged in recognizing images)
For line graphics, text, etc.
This shouldn't be the case here. And even with video, if the video size exactly matches the horizontal screen resolution. Which is usually always the case at 8k/4k/fhd with commercial content and home-recorded content.

That's not what drove inflation. Yes, doing that will cause inflation, but it's not the only way inflation happens.
This is precisely the excess of the money supply over the amount of available goods. Pure fiat inflation. And the whole world is built on this through robbery (seigniorage) with a loan percentage in favor of the ruling rich strata - those who are always closer to the "printing press". You worked (conscientiously and productively for civilization) - you received fiat money in your hands, but they are endlessly depreciated through the robbery of your savings by systemic inflation. That is, each subsequent generation actually depreciates the results of your old labor, because with age you are forced to live on the results (savings) received from your labor and you cannot work until death. Although the parasites in power and the parasitic layers adjacent to them would like you to die the moment you stop working, so as not to pay you a pension and steal this money as much as possible..although current pension systems have long since become a Ponzi scheme with stagnating labor productivity and an increase in parasitic layers of the population. Therefore, their collapse is already inevitable, along with the collapse of goverments. Stock markets, investments in such things - this is a derivative of loan interest and a race for mills in the end. It is the printing of money and the accumulation of debt that leads the USA to complete bankruptcy and this will happen soon.

SSD prices fall due to NAND density increases. By die area, the NAND chips don't actually get cheaper, but through 3D scaling and increasing the number of bits per cell, they can drive down the cost of capacity faster than the price of the silicon increases.
You contradict yourself - if the capacity is getting cheaper. It means the technology of its production is getting cheaper. And this is exactly what was with processor technologies until the 2020s - then there was an increasing stagnation. They can no longer increase performance at the same rate per 1W of consumption. The exponential curve has turned into almost a plain if you look at it on a large scale. This is a technological dead end. Without any conditions. They will still twitch in agony for 5-10 years, but then the end, unless some economically feasible solutions are found for +35-40% once a year. This is what will really move everything further, as it was before, and then they approached physical limits and effects.

When viewing, the chroma sampling rate doesn't matter, because your eye has poor chroma resolution. The reason 4:4:4 is favored in commercial film & video production is that certain processing they typically do can result in artifacts from low-resolution chroma.
This is not true - the eyes see the difference. Especially in dynamics. It just happened so poorly - technologically since ancient times - PAL/SECAM. But why is this now, except for reducing the bitrate? And why show 4:2:0 on the fhd screen if the source is 4:2:0 with 4 times more pixels? This is simply stupid technically and fundamentally.

Some blu-ray players and other devices will let you lock the output into RGB mode, which forces 4:4:4. HDMI mandates that all compliant devices support RGB mode. So, the display will support it and it's just a question of getting your source to force it.
I know - but the problem is in the output firmware after decoding and in software players that do not output 4k content to the fhd screen with full color collected from a 2x2 matrix in 4k. They can output in 4:4:4 - but this mode is a dummy - because there is no data in fhd 4:4:4 format at the input of this mode.
 
Last edited:
Thank you. Now that I can read that, I see that they didn't find a conclusive answer to why the performance differed. Could be a driver issue with the TB4 controller, it could be negotiating a lower link speed, could be lots of things. Someone observed that it was builtin to the P and not the HX, but that, itself, is not necessarily conclusive. One thing this means is that the controller implementation is different, leading to the possibility of the issues I mentioned affecting one and not the other.

These are a lot of reviews on purchased SSDs even in large stores/retail chains. Every time someone complains about the performance of the SSD PCI-E 4.0 x4, it later turns out that the buyer unknowingly connected and tested the disk in the M.2 slot on the south bridge, and not connected to the processor. There are too many of them and I did not specifically collect statistics.
I would love to see a benchmark that compares the performance of both slots. I wonder if the motherboard slot they used only has 2 lanes connected or is running at PCIe 3.0 speed, for some reason. There could be many reasons for the performance discrepancy, which is why it's worth finding a professional reviewer who knows how to control for all of these factors.

Bingo! That's right! This is exactly what I wrote about - no multiplexing and free division into the number of lines - wasted transistor and thermal budget. For what and why?
Because PCIe is a packet-switched network. In order to be fast and efficient, the switch cannot be infinitely flexible - there have to be limits. This is a classic engineering tradeoff.
 
Last edited:
  • Like
Reactions: thestryker
Well, what do you say about the screenshots of your own message in Chrome? You still haven't posted your screenshots. Is that enough evidence that your 2.5k is just really bad on 27" in this browser and others based on it?

Even on a laptop with 2.5k/16" everything doesn't look very good to the eyes with such lousy erroneous antialiasing.
 
The argument regarding image quality has nothing to do with the topic at hand and is subjective. Some people cannot play games with narrow FoV, some have issues with frame rates with rapid motion so on and so forth. There's also a lot more that goes into screens and eye strain than just pixels.

OldAnalogWorld you're putting way too much emphasis on bandwidth when it is not a primary driver of design. Bandwidth in design is always hand in hand with area, latency and capacity. HBM can exceed cache bandwidth, but even best case latency is significantly higher. There won't be a breakthrough here until capacity, area, latency and bandwidth all make sense (cache stacking is a good bridge).

Niche use cases also do not drive design no matter how much any of us might like to have ours tailored to. Again Strix Halo only has a 256-bit bus due to the GPU whether or not the CPU benefits.

Oh and regarding DDR2/3 latency they definitely weren't necessarily better at all:

I took a gander at my own purchase history
DDR2 1066 C5: ~9.38ns
DDR3 1600 C9 (early lifecycle): ~11.25ns
DDR3 2133 C9 (late lifecycle): ~8.44ns
DDR4 3600 C16: ~8.89ns
DDR5 4800 C40 (JEDEC): ~16.67ns
DDR5 7200 C34: ~9.45ns

I know the following feeds the off topic, but since there's a distinct lack of data I could find this is what I've got regarding chipset SSDs.
I would love to see a benchmark that compares the performance of both slots. I wonder if the motherboard slot they used only has 2 lanes connected or is running at PCIe 3.0 speed, for some reason. There could be many reasons for the performance discrepancy, which is why it's worth finding a professional reviewer who knows how to control for all of these factors.
I wish I'd tested one of the P44 Pros on the CPU M.2 but here's what I do have:

Same between platforms:
CrystalDiskMark 5.2.1 x64/1 run at 2GiB size (I'd just done a quick run when putting the server together so it's far from perfect)

P41 Platinum (2TB/CPU connected OS drive/15% full)/W680/12700K Windows Server 2022 fresh install:

Seq Q32T171026428
4K Q32T111991094
Seq57915153
4K90.06445.3

P44 Pro (2x 2TB/chipset connected secondary drives/52% full)/Z890/265K/Win 11 23H2 (not actively being used, but a bunch of stuff running in the background, no fresh install data available from the same CDM version):

Drive1:
Seq Q32T170315970
4K Q32T1995.5846.9
Seq50754416
4K67.38267.8

Drive2:
Seq Q32T169655837
4K Q32T11026852.3
Seq45824564
4K53.64257.4

Both drives at the same time:
Drive1Drive2
Seq Q32T1 R69936979
Seq Q32T1 W59635962
4K Q32T1 R508.9557.5
4K Q32T1 W423.6471.2
Seq R49424580
Seq W47364566
4K R56.2752.49
4K W247.6240.7
 
  • Like
Reactions: bit_user
In fact, the pursuit of profit kills versatility. This is exactly what x86 was famous for and what Apple ultimately sacrificed.

Processor cores have long been suffocating from very slow memory (look at the L1 cache - even a 17-year-old C2D can easily handle a throughput of 100 GB/s and higher). I understand that everything has an economic background, but it is very sad, especially when the RAM can no longer be replaced or upgraded, to see the wretched LPDDR5(x) soldered instead of the 1024-bit HBM3, which for the same reasons of "economic feasibility" is now soldered only into server chips. And I am absolutely sure that this is not a problem of excessive consumption even for notebook SoC x86 - integration of HBM3 controller into chiplet with processor and memory of at least 32 GB with throughput from 500 GB/s - as a mass solution. It just requires courage - to set up mass production and then the price will gradually fall rapidly, as with all technologies of the past...
HBM has a significant energy cost (electrons being pushed on a vast number of wires) and high assembly cost for the RAM chips.

It is my understanding that it's the bottlenecks in HBM assembly, which ultimately limited data center GPU supplies in 2024 and perhaps still currently, not the availability of GPU chips from TSMC: they have plenty of those, supposedly even mostly fabbed.

HBM memory connects the individual DRAM chips via Through Silicon Vias (TSVs), and it's not just two chips, like with V-cache, but around a dozen. Any missing contact or chip being broken in the process results in a write-off of everything you've already put together.

Cheaper ways, both in terms of operational cost (it's hard to do) and failure rates (good chips can easily get busted during assembly) are constantly being investigated, but have both a long lead time and no obvious results yet.

That's the main reason that HBM so far is only used in server designs where the high cost are recovered via high utilization.

Apple (and Lunar Lake) uses stacked DRAM dies, but those are not connected via silicon through vias, just at the edges via external wires, pretty much like a DIMM that has its chip stacked on top rather than side-by-side: they don't have wider busses and are pretty near indistinguishable in terms of access protocols and basic performance characteristics from normal DRAM.

It's standard for mobile SoCs which often have a stacked DRAM "module" put on top of the SoC as a package-on-package, to save space and because those mobile SoCs don't burn 100 Watts.

Because wires+traces on these stacked RAM dies are shorter, they can be run with less amplification, saving power or trading it for higher speeds. I believe it's that combination of speed/power which is driving the LPDDR5 use, ultimately even on the die carrier as per Apple/Lunar Lake.

I haven't really seen anything on how these Strix Halo devices are physically built, but if I undertand correctly, LPCAMM2 modules would allow for LPDDR5 chips, speed and consumtion even on a socket and thus perhaps upgradable RAM capacity, which I'd want as well (but not at any cost). But that is 256 Bits and roughly the same number of GByte/s at top LPDDR5 clocks.

You're asking for twice that, 500GB/s and there things would actually become rather unmanageable: you'd need a way bigger GPU to take advantage of it.

(yes, again you could write a synthetic test even for a CPU that could fill that bandwidth, but not the critical mass of money making application)

At twice of Strix Halos 40 CUs you're well into a territory where a small and thin battery based notebook just can't be done, you'd need nearly 200 Watts for the GPU, even if it were to be internal.

And at that power envelope you'd really want it somewhere off to the side, so you can actually cool the combination, much like a Grace+Hopper chip.

Profits pay for chips, 'versatility' on its own has no value for any significant number of applications and customers, which is why all IT evolves so unevenly these days and everything so bespoke it becomes hard to explain unless you follow the path of how things got there.

I've written a paper a few years ago for a publication sponsored by the European Commission that attempted to explain this crazy evolutionary nature of IT design.

P.S. What a thread this has become!
 
  • Like
Reactions: bit_user and Peksha
Thank you. Now that I can read that, I see that they didn't find a conclusive answer to why the performance differed. Could be a driver issue with the TB4 controller, it could be negotiating a lower link speed, could be lots of things. Someone observed that it was builtin to the P and not the HX, but that, itself, is not necessarily conclusive. One thing this means is that the controller implementation is different, leading to the possibility of the issues I mentioned affecting one and not the other.
There is no problem with the implementation. This only once again proved the theoretical calculation that the TB controller connected not directly to the processor, but through a bridge with a bunch of overhead costs, reserving the bandwidth for other devices and their work at the same time - leads to a loss of performance. In the same way, SSD drives (as a powerful consumer of bandwidth) suffer greatly when connected to the south bridge at the same time as a bunch of other devices on the bus. And this is exactly what is proven by multiple reviews of buyers who, out of ignorance, connect first to the south bridge (although there is a free M.2 for the processor, just purely by chance). I read a lot of them in different retail chains and always after the offer to move the disk to the processor slot, the performance immediately changed for the better.

I wonder if the motherboard slot they used only has 2 lanes connected or is running at PCIe 3.0 speed, for some reason.
No, this is just your invention. Buyers clearly indicated the motherboard models and it is absolutely clear that the slots were 4.0 x4, with only one difference, since they did not understand the architecture, they without thinking stuck the disk into the first convenient slot, not thinking about the consequences of such a decision. When someone explained to them (including me) that it was necessary to move the disk to the processor slot, they immediately reported with surprise that the performance became close to that declared by the SSD manufacturer, although in almost 100% of cases it does not correspond to the declared performance, even in the processor slot.

The argument regarding image quality has nothing to do with the topic at hand and is subjective. Some people cannot play games with narrow FoV, some have issues with frame rates with rapid motion so on and so forth. There's also a lot more that goes into screens and eye strain than just pixels.
I have proven with direct screenshots the direct impact on vision in Chrome-based browsers, where it is impossible to disable blurry anti-aliasing. And earlier, by direct comparison of the difference in pixel density using the example of a photo of a letter with different resolutions, I proved that 8k is absolutely necessary for everyone on diagonals over 20". The connection with this topic - the new AMD family - is direct. Which I have also convincingly proven. Without fast RAM, working with 8k equipment will become a problem. That is why they delayed its introduction for 10 years and almost 6 years since the introduction of the Display Port 2.0 (UHBR20) standard, which would allow the mass supply of 8k panels for monitors. On laptops, 8k panels are NOT needed. There, 4k panels are enough. But of course, with a density above 300ppi, as you can easily see with smartphones with 400+ - the picture will be even better, even more analog. But for laptop panels and monitors, distances of 15-25cm are generally insignificant, but distances of 35-45cm are often common and in such a situation a minimum of 220-230, and better 300 ppi should be mandatory to eliminate the effect on the eyes - to exclude chaotic switching (accommodation) of the eye lens from the pixel structure of objects to the objects themselves, when this pixel structure is still distinguishable in stereo vision, where the angular resolution of the eyes is higher. There should be no hint that the eye matrix will be able to spontaneously catch on to the pixel structure at some point. It should see only the objects drawn by this structure, but not the structure itself. Imagine if you saw the molecular structure of the surface of the real world - what a hell it would be for vision...

I'm curious - what browser do you and other participants in this thread use? I hope not Chrome/Edge - otherwise you're your own "evil Pinocchio" - ruining your own eyesight, unless of course you've found a way to eliminate incorrect antialiasing in grayscale for modern browsers. I haven't found such a way. But Linux users wrote that, unlike Windows, you can disable antialiasing in Chromium in the command line, but this key does not work in the command line under Windows. Of course, using ClearType with color overflows on characters due to color subpixel antialiasing is as harmful a technology as incorrect antialiasing in grayscale in Chrome and even in Firefox. It's just that in the latter you can disable antialiasing altogether starting with version 69, when normal antialiasing stopped working there (like in XP), but in Chrome and Edge you can't. Not counting the fact that FF is much more convenient in all aspects.

I don't understand why browser makers have abandoned the only correct and most accurate and eye-friendly grayscale antialiasing scheme. It can't be stupid - they know about this problem.

Again Strix Halo only has a 256-bit bus due to the GPU whether or not the CPU benefits.
Absolutely pointless argument. The Halo line is NOT designed for thin and light laptops and business series. It is designed exclusively for the performance segment, the gaming segment and mobile workstations. Almost all of them are not thin and not light from 2.4 kg+ usually - if the manufacturer has conscientiously approached the cooling system - with really massive radiators and efficient (and at the same time extremely quiet) coolers. And this requires a weight close to 3 kg for 16" models. This is not a laptop - this is a portable workplace with a power supply weighing close to 1 kg, even in the version with GAN charging. And the autonomy of such series is of no interest to anyone at all. They work 99% of the time only from a power supply for an obvious reason - the purposes of use simply exclude battery operation due to their low capacity for such consumption classes and the nature of frequent loads - 150W and above. I also find it funny how NVidia will be able to sell mobile 5xxx series, if the difference in desktop versions between 4090 and 5090 turned out to be only 5% maximum, if normalized by consumption level. That is, to equalize consumption to the same level as 4090, i.e. 4xxx series - and therefore the chances to increase performance in the mobile series, with already hellish consumption of 120-140 W even in the previous 4xxx series, NVidia simply NO with the 5xxx series. How will they increase performance by at least 30% if in order to increase performance by 33-35% in 5090 relative to 4090 they needed to increase the consumption of the desktop version by as much as 27-28%? If they are capable of this, why is 5090, with a difference in consumption of 27-28%, not faster by 1.27x1.3 - +~65%?

That's why Zen5 Halo - a fast igpu is not needed at all, for a trivial reason - they will always have a companion in the form of a dgpu that is several times, if not orders of magnitude faster. And therefore, the 2x increased RAM bandwidth in its favor is 100% meaningless.

Again, by the irony that I described in another news thread - the point of fast IGPU is precisely in models that will not have dGPU, and these are most often cheap models for the mass market and business series. But it is there that the 256-bit controller and high bandwidth will not be, because of which the IGPU will choke. A 256-bit bus, based on your own reasoning, should have been only in Zen5 Strix - since this series will often not have a companion in the form of dGPU and the performance of IGPU becomes extremely important for buyers of such a series.

A simple example - AMD deliberately cut IGPU in Zen4 HX (7x45) compared to Zen4 Phoenix (7x40). And they did the right thing - there is no point in fast IGPU there. Apart from banal decoders for streaming services and other nonsense, which is enough even for a cut-down version. With HX, in 99% of cases, there is a dgpu, which takes on all the work. By the way, the HX series does not even have a built-in USB40 controller - also, since this does not make sense - it already always has a dgpu. But Zen4 Phoenix has 2xUSB40, which theoretically allows for (with aggregation of 2 ports in 1) pcie 4.0 x4 for eGPU boxes. And bingo! This is exactly the performance of a semi-homemade solution in the form of Occulink, which originated from attempts to bring a cable from M.2 pci-e 4.0 x4 to eGPU out of the laptop case. To be able to play with acceptable performance in 3D on laptops without dgpu.

Therefore, the point, as I have convincingly proved above, in the 256-bit Halo controller is not the work of the built-in at all - NO ONE needs it there, namely the acceleration of the x86 cores, because they have been suffocating for more than a year with extremely slow memory.

And if the 5090 mobile is connected via PCI-E 5.0 x16 (someone above, I remember, assured that mobile dgpu were already connected via x16 to PCIE 4.0, although this was not really the case) to Zen5 Halo, it will significantly, by several tens of %, win over the 128-bit Arrow Lake bus. But only if the connection is via 5.0 x16. This will especially strongly affect game scenarios with "seamless" maps (worlds) and the transition between "levels".

In all other intensive calculations and processing of data arrays, an increase in the real memory bandwidth even by 2 times will give a powerful breakthrough. Zen4 HX has an average real bandwidth of about 60-65 GB/s across dozens of benchmarks in reviews. If Halo gives 150 GB/s+, it's just a new leap in x86 core performance (remember that Apple M4 Pro gives about 200-210 GB/s in reviews, and M4 Max about 220-230 vs 110-120 for M3 Pro and 120-140 for M3 Max). And Intel will only be left to swallow dust behind AMD throughout 2025, at least. But until I see the first real benchmarks, real laptops from retail, nothing can be said for sure.
 
Last edited:
Status
Not open for further replies.