News Intel's Core Ultra 9 285H outperforms the Ryzen AI 9 365 in user review — Alchemist+ offers a nice bump in synthetics, but gaming performance remai...

Admin · Jan 13, 2025

The Core Ultra 9 285H and 225H have been put to the test against AMD's Strix Point APUs.

Intel's Core Ultra 9 285H outperforms the Ryzen AI 9 365 in user review — Alchemist+ offers a nice bump in synthetics, but gaming performance remai... : Read more

thestryker · Jan 13, 2025

It's not particularly surprising that the performance is the same as MTL since the clock speed wasn't increased at all on the IGP. The only things which could really improve the performance there is if the system had higher speed memory or the IGP was getting a better share of the power. The biggest benefit is XMX being there which means higher quality upscaling which is fairly important at the lower resolutions expected at this level of performance.

Deleted member 2839125 · Jan 13, 2025

Right sure have seen these things perform and the AMD chip is a beast

gg83 · Jan 13, 2025

I would love more info on the igpu. Why's it perform so poorly? Oh gosh, arrow lake/lunar lake all the lakes. Whys arrow igpu not so good but lunar lake is very good?

bit_user · Jan 13, 2025

Once I saw the core counts of what they were testing, I'm not surprised.

I do think this slide is worth a second look. Also, where was the Ryzen AI 9 370 in their bar graphs?

In general, Intel's newer CPUs are more competent at things like Cinebench MT than Raptor/Meteor Lake, due to the E-cores improving so much on floating-point performance.

Jaack18 · Jan 13, 2025

gg83 said:
I would love more info on the igpu. Why's it perform so poorly? Oh gosh, arrow lake/lunar lake all the lakes. Whys arrow igpu not so good but lunar lake is very good?

Lunar lake has a battlemage based gpu tile.

Mama Changa · Jan 13, 2025

Arrow Lake uses Xe not Xe2 in Lunar Lake. Some information it might be Xe+ but at best it's a tweaked Alchemist.

If you want a good mobile apu from Intel wait until Panther Lake launches, this year. It will be massively better, getting Xe3, 18A, 10% IPC better memory controller location etc.

bit_user · Jan 13, 2025

Mama Changa said:
If you want a good mobile apu from Intel wait until Panther Lake launches, this year. It will be massively better, getting Xe3, 18A, 10% IPC

For Thin & Light, yeah.

For performance laptop, Ryzen 9 AI Max should fit the bill. I forget exactly when AMD said they would launch, but should be very soon.

Mama Changa said:
better memory controller location etc.

In Arrow Lake, sure. Lunar Lake already puts it on the compute tile - can't do better than that!

JayNor · Jan 14, 2025

The idea for the Arrow Lake H was to add XMX and so support XeSS2 and AI processing.

gg83 · Jan 15, 2025

Jaack18 said:
Lunar lake has a battlemage based gpu tile.

Ah, thank you.

OldAnalogWorld · Jan 15, 2025

The irony of the igpu race is that the most powerful solutions are installed on the most powerful cpus (and therefore the most expensive), which will almost certainly have a second companion on the motherboard - a dgpu with a dedicated ultra-fast vram, and its maximum size. But in cheap processor models, which are most often installed in mass-market laptops, have weak igpus, although everything should be exactly the opposite - they should have a top igpu and top (in terms of bus bandwidth) RAM, since it is shared with other hardware and OS/software and, most importantly, since they do not have and will NOT have a companion in the form of a dgpu.

Economically, this situation seems understandable, but in practice it is surreal - where the fastest igpu is needed, the slowest one is installed, and where it will never be used for tasks interesting to the public, since there is a dpgu - it wastes the transistor budget and the buyer's money..

I would cautiously even call it social equalization, but now Mr. Musk will come running, hehe...

bit_user · Jan 15, 2025

OldAnalogWorld said:
The irony of the igpu race is that the most powerful solutions are installed on the most powerful cpus (and therefore the most expensive), which will almost certainly have a second companion on the motherboard - a dgpu with a dedicated ultra-fast vram, and its maximum size. But in cheap processor models, which are most often installed in mass-market laptops, have weak igpus,

Intel usually does the opposite of what you said. If you look at Alder Lake & Raptor Lake U & P series, they have a large iGPU but fewer CPU cores. The H & HX parts had lots of CPUcores but a small iGPU, because nearly everyone paired them with a dGPU.

Even here, the 258V has twice as many Xe cores as the 285H (although the ratio is usually more like 3:1 or 4:1), but the problem is that Lunar Lake has a newer generation of iGPU architecture with drivers that aren't yet as well optimized. Still, they're good enough for it to pull a significant win in the Speedway benchmark:

I'd suggest another factor in favor of the 285H (Arrow Lake) is that it has a much bigger power budget. iGPUs can burn a lot of power, as I'm sure you can imagine.

thestryker · Jan 15, 2025

OldAnalogWorld said:
The irony of the igpu race is that the most powerful solutions are installed on the most powerful cpus (and therefore the most expensive), which will almost certainly have a second companion on the motherboard - a dgpu with a dedicated ultra-fast vram, and its maximum size. But in cheap processor models, which are most often installed in mass-market laptops, have weak igpus, although everything should be exactly the opposite - they should have a top igpu and top (in terms of bus bandwidth) RAM, since it is shared with other hardware and OS/software and, most importantly, since they do not have and will NOT have a companion in the form of a dgpu.

In the case of MTL-H and ADL-H the lowest SKU has 7 Xe-cores and the others have 8 Xe-cores and the low power SKUs all have 4 Xe-cores (I think with MTL-UL there was a single SKU with 3). LNL SKUs all have the same IGP. Prior to MTL Intel was definitely guilty of scaling IGP cores with CPU cores a lot worse except in the low power parts (these still tended to scale with cost though). AMD is still doing this just like always where they step down the CU count by 4 every tier (starting at 16 going down to 4).

shawman123 · Jan 15, 2025

gg83 said:
I would love more info on the igpu. Why's it perform so poorly? Oh gosh, arrow lake/lunar lake all the lakes. Whys arrow igpu not so good but lunar lake is very good?

ARL-H uses Alchemist iGPU while Lunar Lake uses Battlemage iGPU. But ARL is designed to be paired with dGPU while LNL is for thin and light ultrabooks/2 in 1 with just iGPU.

gg83 · Jan 17, 2025

shawman123 said:
ARL-H uses Alchemist iGPU while Lunar Lake uses Battlemage iGPU. But ARL is designed to be paired with dGPU while LNL is for thin and light ultrabooks/2 in 1 with just iGPU.

Thank you very much. Just like amd does with their different igpus.

OldAnalogWorld · Jan 17, 2025

Well, why does Strix Halo need a powerful IGPU with a 256-bit memory bus if it will never be used in reality, because there is a dgpu?

But in the younger lines - especially the mass R5/I5 - the fastest IGPU is just the most needed and with 256-bit or even 512-bit memory. After all, what I constantly see - discussions of buyers of such laptop lines - how fast are they in games? What's the point of playing on such a laptop if it was not originally designed for this?! That is, they want a universal all-in-one solution, without a dgpu, which also reduces overall reliability and increases overall consumption and weight of the laptop, and also reduces autonomy several times. This is roughly what happened to sound. Separate "discrete" audio cards were once popular. Now almost no one remembers them. If someone needs a high-quality analog output (for example, for headphones), he will buy a high-quality (and therefore expensive) DAC + headphone amplifier. receiving a signal from a laptop or PC digitally. But sound processing has long been done by the processor - that's the main thing and no one needs a separate powerful audio chip anymore. Approximately the same will eventually happen with dgpu - its fate will be sealed if the speed of increasing performance by 1W is preserved (but this is not the case, as everyone has already understood) it will simply be excluded from PC/laptops, and all the work will be taken by the processor cores(igpu cores). Having received, from the point of view of these tasks, excess performance with a "reserve" for everything else. Once upon a time (I remember this in the 90s) even decoding mp3 was a rather complex task for the processor, as was sound processing in games. Now the load on the cores is vanishingly small, it is practically invisible in the task manager...

But again we return to the silicon dead end - the growth of performance in recent years has been mainly due not to improvements in the technological process, but with the help of the consumption race. I call this energy "cheating" started many years ago by Intel. Which NVidia later joined - the 5090 looks especially shameful against the 4090, if you normalize their results in tests for the level of consumption, i.e. calculate the performance per 1W of consumption. And then, after Intel, AMD was forced to join this cheating - after all, they started to overtake it in performance (starting with Alder Lake, where they sharply increased performance through exorbitant consumption and it still remains the "norm" now in the x86 camp, unlike Apple) by wicked methods, diligently hushing up the topic of consumption growth, especially for laptops (and desktop processors/video cards) in bought reviews in order to hide from the most ignorant part of buyers (and they are the majority) the unfortunate fact - a sharp slowdown in the performance growth curve per 1W of consumption during the transition or the impossibility of switching to a finer technological process (and they are increasingly more expensive and complex), which required raising consumption. Which brought the consumption of laptops to insane levels for their case volume and weight/size of radiators - 200W, or even 250+. And all this, with the supposed trends in developed countries towards a "green policy". Moreover, "gaming" laptops are quite a mass series, they are often even bought by companies as a replacement for "mobile workstations", where the manufacturer's markup is much higher, and measures to improve quality (in terms of long-term ownership) are insufficient compared to the increase in price for this. In addition, "gaming" series with clearly higher performance can be made much quieter at the required level of performance, thanks to much more powerful cooling systems, plus they are much easier to upgrade in RAM and disks, and also have a more convenient location, number and quality of ports compared to regular "business" series.

But let's return to the problem of increasing chip consumption (which immediately crosses out real progress in electronics):
Even what is happening with the consumption requests of the latest "AI" data centers clearly shows that humanity has reached another dead end with technologies, in this case, again tied to the "silicon dead end". Figuratively speaking, we are desperately increasing the number of "locomotives" and the costs of their maintenance, trying to get to where it is better to fly on a "rocket" (into space, to the Moon), which no one has yet managed to create or invent...

bit_user · Jan 20, 2025

OldAnalogWorld said:
Well, why does Strix Halo need a powerful IGPU with a 256-bit memory bus if it will never be used in reality, because there is a dgpu?

That's not how AMD intends them to be used. It wasn't until Apple proved a market exists for powerful iGPUs that AMD finally decided to go this route.

Macri gives Apple credit for proving that you don’t need discrete graphics to sell people on powerful computers. “Many people in the PC industry said, well, if you want graphics, it’s gotta be discrete graphics because otherwise people will think it’s bad graphics,” he said.

With the success of Apple Silicon, Macri was finally able to get approval to spend a “mind boggling” amount of money developing the Ryzen AI Max. “I always knew, because we were building APUs, and I’d been pushing for this big APU forever, that I could build a system that was smaller, faster, and I could give much higher performance at the same power,” he said.

Source: https://www.engadget.com/computing/...ly-wouldnt-exist-without-apple-220034111.html

Sounds like it definitely wasn't a decision they took lightly.

Laptops integrating a dGPU will opt for Fire Range, instead of Strix Halo. That's the whole reason there are both.

https://www.tomshardware.com/pc-com...eries-skus-built-on-zen-5-desktop-cpu-silicon

OldAnalogWorld · Jan 21, 2025

It doesn't matter what will happen someday - why 256 bits and a powerful IGPU in Halo, which doesn't need an IGPU except for light video decoding tasks in browsers and players? 3D and heavy computing work in laptops with it will always be done by a real DPGU an order of magnitude more powerful.

Apple's path is pointless for x86, because everything is different here. Apple does not install DPGU, so they are forced to somehow get out of it, especially with a limitation of 120-130W for everything. There is no such problem for x86 and the line is not intended for light laptops and business series. Strix Point is intended for light ones, but why is there no 256-bit bus there, where DPGU is often optional - isn't that strange? And where is the logic in your reasoning then?

bit_user · Jan 21, 2025

OldAnalogWorld said:
It doesn't matter what will happen someday - why 256 bits and a powerful IGPU in Halo, which doesn't need an IGPU except for light video decoding tasks in browsers and players? 3D and heavy computing work in laptops with it will always be done by a real DPGU an order of magnitude more powerful.

As the above link says, they took their guidance from the success Apple is having with its own beefed up iGPUs. You can argue that Apple is a special case and those sales numbers won't translate over to the PC market, but it's generally hard to argue with success.

AMD's gamble clearly seems to be that there's a mid-tier, between thin & light and mobile workstation/gaming laptops. That's where Strix Halo seems designed to play. I can't say whether they're right. I think we'll just have to sit back and see how it does in the market, and whether they choose to continue down this path.

OldAnalogWorld said:
why is there no 256-bit bus there, where DPGU is often optional - isn't that strange? And where is the logic in your reasoning then?

AMD's actions clearly indicate the 256-bit memory interface is needed only to support a big iGPU and not merely for running 16 CPU cores in a laptop. Zen 5 performs well with 128-bit memory subsystem at the higher power levels of a desktop PC. When it's running in a reduced power & thermal envelope of a laptop, a 128-bit memory interface should be even less of a bottleneck.

OldAnalogWorld · Jan 22, 2025

Do you have any specific test data (links) where Apple M4 Pro/Max significantly drops in memory controller throughput performance when switching to battery power?

bit_user · Jan 23, 2025

OldAnalogWorld said:
Do you have any specific test data (links) where Apple M4 Pro/Max significantly drops in memory controller throughput performance when switching to battery power?

Maybe I didn't say that very clearly, but my point was that laptop CPUs restrict the clockspeed of their cores to lower limits than desktop CPUs. Beyond that, the power management plans and thermal solutions also subject them to more aggressive throttling than a performance desktop CPU.

My conjecture is simply that a 16-core laptop CPU, with a cTDP of 45-120W and max boost clock of 5.1 GHz is going to be less sensitive to memory bandwidth than the same 16 cores with a 5.7 GHz max boost clock and a 170W TDP. So, if the Ryzen 9 9950X isn't being massively held back by its 128-bit memory interface, then having a wider interface in a laptop CPU shouldn't give it an overall net benefit.

Instead of speculating endlessly, how about we just wait and see how the Strix Halo actually performs, eh? We can see how its CPU performance and CPU memory bandwidth both compare to Granite Ridge (desktop 9950X) and Fire Range (laptop 9955HX, with a 5.4 GHz max boost).

OldAnalogWorld · Jan 23, 2025

We have no choice but to wait for real tests of bandwidth and its impact on overall performance in different tasks. For lack of access to new hardware and the ability to test all this personally.

But even on desktops it is clear that bandwidth significantly restrains the performance of even old cores. That is why they build a bunch of cache levels, increasing its size, in an attempt to hide the problem from the public - insufficient bandwidth in x86. Apple has finally partially solved this problem with at least 200 GB / s starting with the M4 Pro models. Perhaps AMD has now partially solved it in Halo. Partially - because if we proceed from the tasks of balancing the load between the CPU / IGPU and the use of shared memory by other components, 200 GB / s is already frankly not enough. But Apple, like everyone else, is held back by technical progress at TSMC and, in general, by the fundamental limitations of silicon. It is obvious that if dgpu has vram capable of operating at 250-750 GB/s with consumption in the region of 80-140 W, it should be the same for igpu, which means the total bandwidth should be much, much larger. Let me remind you once again that x86 does not have a limit of 100-120 W, like Apple has in its laptops - there the limits are up to 2.5 times higher. Here we come to 1 TB/s. What is now the norm only on servers, taking into account the shortcomings of technology. It's just that no one cares about consumption limitations there. If they don't have enough energy, they will add a new nuclear power plant, which they are doing now, if they believe, in an attempt to "overclock" neural networks using a linear method of increasing productivity, that it is economically feasible and without taking into account subsidies from states, which in reality greatly distort the real market picture of the optimal market development of technologies - subsidies destroy the market economy and the rules for optimizing all costs. Let's see how it all ends, both on the macro level (the "AI" scam) and on the micro level (Halo and x86 in general). The dead end is getting closer...

OldAnalogWorld · Jan 23, 2025

P.S.
By the way, I somehow didn't pay attention to the shame with the PCI-E bus in the Halo series! But it turns out that they cut out support for the 5.0 bus altogether! Leaving only the shameful 16 PCIe 4.0 lanes! Remember you said that laptops are already connected via 16 4.0 lanes? Well, this will 100% not happen in Halo - a maximum of 8 lanes, because there is simply nothing left on 2xM.2 4.0 x4 SSD! And there is definitely nothing left to operate other ports connected to the south bridge! And what is the point of a 256-bit bus with such a shameful PCI-E bus configuration? This is a complete failure of the Halo series in terms of creating effective configurations of multiple ports and devices on this bus. This is a purely gaming processor for gamers. AMD decided to intentionally limit it so that there would be no temptation to use it as a workstation processor with 24 free 5.0 lanes and a 256-bit controller...So let's leave without even looking at the results of this controller - what's the point without a 5.0 bus?

bit_user · Jan 24, 2025

OldAnalogWorld said:
But even on desktops it is clear that bandwidth significantly restrains the performance of even old cores.

I did provide several memory-scaling benchmarks which show that higher-bandwidth memory makes almost no difference to either compute or gaming workloads. The biggest impact is from using memory with lower latency, which is why DDR4-3600 even beat DDR5-4800, in that gaming benchmark.

OldAnalogWorld said:
That is why they build a bunch of cache levels, increasing its size, in an attempt to hide the problem from the public - insufficient bandwidth in x86.

CPU caches aren't only about solving the bandwidth problem. As I said before, latency is the more crucial issue. I cited a very current and relevant example, to illustrate this point. You seemed to have glossed over it, perhaps without appreciating what it tells us.

I linked a slide showing the additional cache tier that Intel added to the Lion Cove P-cores in Lunar Lake and Arrow Lake. Intel calls these Level 0 (data), Level 1 (data), and L2. However, the L0D is the same size as the previous generation's L1D and the L2 is only a little bigger (by 50%) than the previous generation's L2. So, essentially, what they did was to shoehorn a level in between the old L1D and L2.

Source: https://www.tomshardware.com/pc-com...pc-gain-for-e-cores-16-ipc-gain-for-p-cores/2

What's crucial to appreciate, here, is that the only parameters which differ between the new L1D and L2 are the size and latency. Bandwidth is the same! So, we can clearly see what a top-level concern latency is, for the CPU designers.

If you compare these to DRAM latency, you can see why DRAM latency is such a performance-killer. Again, here's how the cache latencies compare with each other and DRAM.

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68e5a5cd-50f4-4250-89ea-2213603b5bc4_1853x995.png

Source: https://chipsandcheese.com/p/analyzing-lion-coves-memory-subsystem

Intel went to all the trouble of adding another level of cache, just for the sake of lowering the stair step after the lowest-level cache (what they used to call L1D and now call L0) and elongating the L2 step.

All the memory bandwidth in the world won't help with latency! You could have an infinite amount of memory bandwidth, but a CPU thread is still going to be sitting idle for a large chunk of that ~100 ns (which translates to ~500 clock cycles, in a CPU core running at 5 GHz), every time it needs to read more data from DRAM. It's latency that can really murder CPU performance!

Now, how you want to characterize cache is a matter of perspective. Is it an optimization or a dirty trick? I only consider something a dirty trick, if it has some significant downside that's worse than whatever benefit I'm getting. In modern CPUs, caches are sufficiently refined that they don't really have such downsides. If you get rid of caches, you throw out the baby with the bathwater. You can't substitute them with 4096-bit HBM, or anything of the sort.

OldAnalogWorld said:
It is obvious that if dgpu has vram capable of operating at 250-750 GB/s with consumption in the region of 80-140 W, it should be the same for igpu, which means the total bandwidth should be much, much larger.

The M-series Max have 512-bit memory interfaces, but the Ultra doubles this - and it's the one which is designed to compete against the fastest desktop dGPUs. The M1 Ultra had 800 GB/s of memory bandwidth, at a time when the fastest dGPU (RTX 3090) had 936 GB/s. However the RTX 3090 also had a TDP of 350 W, which is more than twice as much as a Mac Studio would draw. It's clear that the M1 Ultra couldn't match it, performance wise, but it was definitely the fastest iGPU ever made.

I think Apple will never beat Nvidia with its iGPUs, but (for the most part) it doesn't really need to. It just needs to get into the ballpark, in order to meet the needs of most of its power users.

OldAnalogWorld · Jan 24, 2025

bit_user said:
The M1 Ultra had 800 GB/s of memory bandwidth, at a time when the fastest dGPU (RTX 3090) had 936 GB/s.

I've never seen such numbers, but even if so, it was still useless, because they didn't even come close to the 3070's 3D performance, let alone the 3080. But it would certainly have affected the overall performance if the processor cores had access to all that bandwidth.

And I didn't argue with you about the memory latency. Obviously, it's a key factor. And by the way, most reviews don't even consider it, just like memory bandwidth, although you can easily post a screenshot from AIDA64 - a cache and memory test. It has everything at once. But "reviewers" ignore this because their task is somewhat different than showing the real weak points of architectures and where you need to work hard...

News Intel's Core Ultra 9 285H outperforms the Ryzen AI 9 365 in user review — Alchemist+ offers a nice bump in synthetics, but gaming performance remai...

Administrator

Judicious

Deleted member 2839125

Guest

Distinguished

Titan

Proper

Titan

Honorable

Distinguished

Titan

Judicious

Reputable

Distinguished

Titan

Titan

Titan

Titan

Share this page