News Jim Keller Shares Zen 5 Performance Projections

Kamen Rider Blade · Apr 16, 2023

bit_user said:
Oh, they do issue patches. But, those just slow down your CPU to the point that you really need to buy a new one!

It's like one step better than "planned obsolescence".

It's almost like they planned for that to happen.

You want a security patch, sure, it'll also tank your performance.

Please buy my new shiny CPU.

Kamen Rider Blade · Apr 16, 2023

bit_user said:
Well, Apple has its M-series. Those are monolithic, except for the Ultra.

So, mostly monolithic

bit_user said:
Apple is doing fine with LPDDR5X. Nvidia claims Grace would use the same power with either that or HBM (I forget if it was 2e or 3), but that LPDDR5X is currently a lot cheaper.

But for ultimate Mobile Performance, you'd want HBM3 for the GPU and LPDDR6 for the CPU side.
Combine them together and the DX12 update for the CPU to fully access the GPU's memory, you'd see some real performance improvements.

InvalidError · Apr 16, 2023

bit_user said:
It adds nonzero cost, so there needs to be enough cases that benefit from it. I'm going to speculate that most of those cases which gained from SMT2 would show little or no further benefit from SMT4, and we'd probably see even more regression on the cases which don't benefit.

When Intel first launched HT on the P4, it claimed HT added less than 5% to core complexity. Adding more threads when all of the support framework is already there probably wouldn't add much additional complexity besides an extra bit to things that track which thread something belongs to. The difference in responsiveness on my P4 between HT on and off was pretty noticeable, well worth leaving it on even if it may make some stuff slightly slower.

As for how many things that benefit from SMT2 would benefit from SMT4 too, that would depend on how much slack there is left in execution resources with SMT2. In sufficiently heavily threaded applications, you may gain more performance from SMT4 decreasing the pressure on speculative execution and other potentially expensive "gotta go fast" tick burn, allowing things to run more efficiently if not faster.

Of course, code written as a single thread that must run as fast as possible would suffer from any increased degree of resource competition as usual.

bit_user · Apr 16, 2023

Kamen Rider Blade said:
But for ultimate Mobile Performance, you'd want HBM3 for the GPU and LPDDR6 for the CPU side.

If there's enough bandwidth to go around, why separate them?

Kamen Rider Blade said:
Combine them together and the DX12 update for the CPU to fully access the GPU's memory, you'd see some real performance improvements.

If you follow the link in that article to MS' website, it sounds like its main benefit is for unified memory iGPUs. With a dGPU, reading across PCIe is fairly slow (much worse than CXL).

Given the date of the source, this is probably referring to PCIe 2.0:

Source: https://www.researchgate.net/figure/PCIe-Read-Latency-s_tbl1_47760290

Eh, pic doesn't show but it claims read latency of 1600 to 1900 ns. Glancing at the paper, I can't confirm if the "No DMA" case is for a single, round-trip PCIe transaction but it can't be for a whole lot. So, take it with a grain of salt. Regardless, you don't want to make a habit of doing PIO reads out of dGPU memory.

bit_user · Apr 16, 2023

InvalidError said:
When Intel first launched HT on the P4, it claimed HT added less than 5% to core complexity. Adding more threads when all of the support framework is already there probably wouldn't add much additional complexity besides an extra bit to things that track which thread something belongs to.

So, the P4 famously didn't implement HT very well. And they also really didn't worry about side-channel attacks, at all. Since @hotaru.hino and I already touched on the implications, I won't repeat those here.

However, given that a core already implements SMT2, I think the overhead of going to SMT4 would be pretty small, if we're just talking about the bare minimum to do it securely (i.e. no cache increases).

BTW, do we know if recent Intel or AMD cores instantiate a separate decoder per HT?

InvalidError said:
In sufficiently heavily threaded applications, you may gain more performance from SMT4 decreasing the pressure on speculative execution and other potentially expensive "gotta go fast" tick burn, allowing things to run more efficiently if not faster.

More threads -> more pressure on caches. If it results in your cache hit-rate dropping markedly, then it might not be worth filling the few pipeline bubbles that having more threads could do. I think that's what happened with a lot of the SPEC2017 fp workloads.

InvalidError · Apr 16, 2023

bit_user said:
BTW, do we know if recent Intel or AMD cores instantiate a separate decoder per HT?

As far as I can tell from the Golden Cove architecture slides, there is only one decoder per core and it can decode up to 32B worth of instructions for a maximum of six operations at a time. There isn't much of a point in having dedicated decoders for each thread when execution usually spends most of its time in loops hundreds of instructions long at most and there is a 4k uOPs replay cache to bypass instruction decoding during that time.

Kamen Rider Blade said:
But for ultimate Mobile Performance, you'd want HBM3 for the GPU and LPDDR6 for the CPU side.
Combine them together and the DX12 update for the CPU to fully access the GPU's memory, you'd see some real performance improvements.

The most efficient use of resources would be to use HBM3 for everything, then you avoid wasting time, power and space duplicating assets in system memory and VRAM because they are one and the same just like consoles.

The best copy is zero-copy.

bit_user · Apr 16, 2023

InvalidError said:
As far as I can tell from the Golden Cove architecture slides, there is only one decoder per core and it can decode up to 32B worth of instructions for a maximum of six operations at a time.

Usually, there are restrictions, such as a prior generation (I forget which) being able to decode 1 "complex" instruction + 4 simple ones, per cycle. I think Intel hasn't disclosed what restrictions apply to Golden Cove's decoder.

InvalidError said:
There isn't much of a point in having dedicated decoders for each thread when execution usually spends most of its time in loops hundreds of instructions long at most and there is a 4k uOPs replay cache to bypass instruction decoding during that time.

Tremont introduced the concept of a split 3+3 decoder, where it seems like each half can decode a different instruction stream. Since Tremont is single-threaded, I think the only way you get multiple concurrent instruction fetches is by speculative execution. Also, Tremont has no uOP cache, so it's more dependent on its decoder than Golden Cove.

Kamen Rider Blade · Apr 17, 2023

bit_user said:
If there's enough bandwidth to go around, why separate them?

Latency would be better for normal CPU (Non GPU) stuff on LPDDR6
So there's that consideration.
And given that it's monolithic, it shouldn't be much of a penalty for a CPU to bounce into the GPU to Read/Manipulate data, even w/o the extra copy function that usually happens on traditional modular setups.

bit_user said:
If you follow the link in that article to MS' website, it sounds like its main benefit is for unified memory iGPUs. With a dGPU, reading across PCIe is fairly slow (much worse than CXL).

Given the date of the source, this is probably referring to PCIe 2.0:

Source: https://www.researchgate.net/figure/PCIe-Read-Latency-s_tbl1_47760290

Eh, pic doesn't show but it claims read latency of 1600 to 1900 ns. Glancing at the paper, I can't confirm if the "No DMA" case is for a single, round-trip PCIe transaction but it can't be for a whole lot. So, take it with a grain of salt. Regardless, you don't want to make a habit of doing PIO reads out of dGPU memory.

But that's when your GPU is connected via a PCIe bus.

What happens if your GPU is on the same Ultra Low Latency Infinity Fabric bus as your CPU in a monolithic APU?

That should change the latency equation by quite a bit.

And since it's a APU, the latency to access the GPU's memory controller shouldn't be that big of a hit compared to a traditional dGPU model. That was the whole point of the APU.

InvalidError said:
The most efficient use of resources would be to use HBM3 for everything, then you avoid wasting time, power and space duplicating assets in system memory and VRAM because they are one and the same just like consoles.

The best copy is zero-copy.

I concur on the "Zero-Copy" model, but that's the entire point of the DX12 update.

Microsoft has announced a new DirectX12 GPU optimization feature in conjunction with Resizable-BAR, called GPU Upload Heaps (opens in new tab), that allows the CPU to have direct, simultaneous access to GPU memory. This can increase performance in DX12 titles and decrease system RAM utilization since the feature circumvents the need to copy data from the CPU to the GPU. The new feature is available now in the Agility SDK.

We don't know the actual implications of this feature, but the performance advantages could be significant. Graphics card memory sizes and video game VRAM consumption are getting larger and larger every year. As a result, the CPU needs to move more and more data between itself and the GPU.

With this feature, a game's RAM and CPU utilization could decrease noticeably due to a reduction in data transfers alone. This is because the CPU no longer needs to keep copies of data on both system RAM and GPU VRAM to interact with it. Another bonus is that GPU video memory is very fast these days, so there should be no latency penalties for leaving data on the GPU alone. In fact, there will probably be a latency improvement with CPU access times on high-end GPUs with high-speed video memory.

This update should really benefit APU's dramatically since the VRAM Memory Controller is near-by for the CPU to communicate through. Just like a console.

That's why I want to see a "Big Ass" APU with 64 CU's of RDNA3.

That would be "Amazing" to see crazy performance on a portable LapTop like body.

InvalidError · Apr 17, 2023

Kamen Rider Blade said:
Latency would be better for normal CPU (Non GPU) stuff on LPDDR6

I concur on the "Zero-Copy" model, but that's the entire point of the DX12 update.

This update should really benefit APU's dramatically since the VRAM Memory Controller is near-by for the CPU to communicate through. Just like a console.

While HBM may have slightly worse worst-case latency than DDR5, most of that latency gets hidden by increased concurrency and the ability to simultaneously issue CAS and RAS commands, latency is more uniform and total interface time to complete a workload is lower most of the time. Intel wouldn't be packing 64GB of the stuff in its shiny new high-end server CPUs if it didn't provide consistently improved performance.

The DX12 update doesn't benefit unified memory APUs at all since the IGP and CPU are already sharing the exact same physical address space. All of the D3D heap memory regardless of which D3D memory pool you get it from (0/system, 1/GPU) was already directly accessible to the IGP as-is since chipset IGPs ~25 years ago.

bit_user · Apr 17, 2023

Kamen Rider Blade said:
That's why I want to see a "Big Ass" APU with 64 CU's of RDNA3.

That would be "Amazing" to see crazy performance on a portable LapTop like body.

Like a Macbook Pro? The M2 Max has 400 GB/s of memory bandwidth and 38 core GPU. According to this, it performs similar to a Nvidia RTX 4070 laptop dGPU:

https://www.notebookcheck.net/Apple-M2-Max-38-Core-GPU-Benchmarks-and-Specs.682902.0.html

Expand the comparison charts for 3DMark and the GFXBench Offscreen tests.

bit_user · Apr 17, 2023

InvalidError said:
While HBM may have slightly worse worst-case latency than DDR5, ...

Not only that, but it amazes me to see how many people simply take the synthetic, single-threaded latency benchmarks as the final word on memory latency. That's a best case metric!

InvalidError said:
latency is more uniform and total interface time to complete a workload is lower most of the time.

When you're dealing with heavily-multithreaded workloads, the queues will fill up, resulting in actual latencies probably several times the best-case latency measured by synthetic tests. That's when bandwidth starts to count for a lot more than best-case latency, since keeping those queues at low-occupancy will more than make up for the higher intrinsic latency of HBM.

InvalidError said:
Intel wouldn't be packing 64GB of the stuff in its shiny new high-end server CPUs if it didn't provide consistently improved performance.

For now, it's just the "HPC optimized" variants that will be getting HBM. They suffered a haircut in their max turbo clock speed, having it cut to 3.5 GHz, whereas the fastest of its MCC cousins has a max turbo of 4.2 GHz and its XCC siblings go up to 4.0 GHz.

Source: https://www.tomshardware.com/news/i...en-xeon-cpus-and-ponte-vecchio-max-gpu-series

If you look at Intel's AMX, or know anything about deep learning, it's heavily-dependent on memory bandwidth. So, I think that was a significant motivating factor. I'm eager to see some benchmarks of Xeon Max on other workloads, but I think it's not slated to launch until like Q3.

InvalidError said:
The DX12 update doesn't benefit unified memory APUs at all since the IGP and CPU are already sharing the exact same physical address space.

By my reading, it's optimal for APUs. Without the update, DX12 would force you to keep 2 separate copies of an asset (i.e. if you wanted the CPU to have access to it), even though they happen to be in the same physical memory.

InvalidError said:
All of the D3D heap memory regardless of which D3D memory pool you get it from (0/system, 1/GPU) was already directly accessible to the IGP as-is since chipset IGPs ~25 years ago.

I think you're not accounting for the fact that iGPUs have their own address space. Driver, kernel, and graphics runtime support is needed to map the GPU's address space into userspace, so your app can look at that memory.

InvalidError · Apr 17, 2023

bit_user said:
I think you're not accounting for the fact that iGPUs have their own address space. Driver, kernel, and graphics runtime support is needed to map the GPU's address space into userspace, so your app can look at that memory.

Most of the time though, it is only the (I)GPU needing to look into system memory and taking care of those mappings is part of the D3D heap allocation and resource creation process. For an IGP where the mappings are in system memory regardless, software can see whatever the IGP is doing by simply looking at its D3D heap regardless of the DX12 update and BAR status.

bit_user · Apr 17, 2023

InvalidError said:
Most of the time though, it is only the (I)GPU needing to look into system memory and taking care of those mappings is part of the D3D heap allocation and resource creation process. For an IGP where the mappings are in system memory regardless, software can see whatever the IGP is doing by simply looking at its D3D heap regardless of the DX12 update and BAR status.

I think you're still making some bad assumptions, rather than speaking from actual knowledge.

Memory used by the iGPU must be subject to different cache policy, for the CPU, if the GPU isn't fully cache-coherent. That means blocking it off via MTRR (Memory Type Range Registers), which are a limited resource. This prevents the iGPU from being able to map and access whatever userspace datastructures you might happen to want to pass to it, because you'd quickly run out of configurable memory ranges to make uncacheable. Furthermore, the OS needs to exclude those pages from other uses or swapping (assuming the GPU can't generate page faults, as was the case with older iGPUs that use physical addresses).

Another concern you seem to be completely ignoring is security. Multiple applications need to be able to share the GPU, and not only must they be prevented from seeing each others' data in GPU memory, but their GPU code also mustn't be able to see into the userspace of another process than the one which launched it.

So, it's not as simple as an app snooping the heap for the D3D memory it allocated, and then following those pointers. ...if that's what you were imagining.

Kamen Rider Blade · Apr 18, 2023

I wonder if any of the console makers have some sort of leverage to hold AMD back from making a big APU?
Because they're the ones that might feel "Threatened" if AMD made something that nice.

Personally, I think they're being paranoid about PC Gaming threatening Console gaming.

But we all know that the Console Player Base generally wants a "Simpler" experience and aren't willing to deal with the extra steps that us PC gamers are willing to deal with.

bit_user · Apr 18, 2023

Kamen Rider Blade said:
I wonder if any of the console makers have some sort of leverage to hold AMD back from making a big APU?

No, I highly doubt it. Not formal leverage, anyway.

Informally, their biggest threat is to switch to another SoC maker. But... I mean, Intel hasn't exactly made the greatest showing with its dGPUs, and switching to Nvidia would mean switching the CPU to ARM. Between them and Sony, I'd say Microsoft is the more likely of the two to go for it, since they're trying to push Windows on ARM and that would probably drag some games onto the platform.

I'm not sure how interested Nvidia would be in doing a custom SoC, either. It's doing so well in AI, and while it might be willing to sell Orin Nano chips to Nintendo, doing a custom SoC for a big console might be a distraction for their engineering department that's not worth the relatively small profits it'd bring them.

Kamen Rider Blade said:
Because they're the ones that might feel "Threatened" if AMD made something that nice.

I think they feel much more threatened by each other, to be honest. A PC, even one with a comparable APU, would always be more expensive than Playstation and XBox due to the way MS and Sony have cost-optimized them and sell the hardware almost at-cost.

Kamen Rider Blade said:
Personally, I think they're being paranoid about PC Gaming threatening Console gaming.

I don't really follow the gaming world, so I'm in no position to disagree but curious why you think that.

Kamen Rider Blade · Apr 18, 2023

bit_user said:
No, I highly doubt it. Not formal leverage, anyway.

Informally, their biggest threat is to switch to another SoC maker. But... I mean, Intel hasn't exactly made the greatest showing with its dGPUs, and switching to Nvidia would mean switching the CPU to ARM. Between them and Sony, I'd say Microsoft is the more likely of the two to go for it, since they're trying to push Windows on ARM and that would probably drag some games onto the platform.

I wouldn't count on Intel for it's Graphics Division for quite some time.
nVIDIA barely got it's SoC division to make any sales.
It's biggest customer is Nintendo and it's Switch.

MS' ARM push has been a joke.
Nobody buys Windows for "ARM".
The install base for "Windows on ARM" suck compared to the MASSIVE x86 library that has been built up over the years and the support that x86 has received.

bit_user said:
I'm not sure how interested Nvidia would be in doing a custom SoC, either. It's doing so well in AI, and while it might be willing to sell Orin Nano chips to Nintendo, doing a custom SoC for a big console might be a distraction for their engineering department that's not worth the relatively small profits it'd bring them.

Historically, nVIDIA has burned bridges with many of it's so-called partners.
After Bump-Gate, Apple has REFUSED to work with nVIDIA ever again after they shifted all the blame for the bad solder-bumps to Apple.

When MS wanted a Die-Shrink from nVIDIA for the Xbox, nVIDIA laughed and said "Pay-Up" or "Go-Away".

Sony's relation with nVIDIA for the PS3 wasn't great either.

Many vendors have been One & Done with nVIDIA.

Nintendo is on their first experience with nVIDIA.
We'll see how long they stick with nVIDIA.

Nintendo is FAMOUS for being cheap skates.
Something nVIDIA isn't happy about, they wanted Nintendo to spend more money per SoC for the original Nintendo Switch. Nintendo said naw, we want as cheap as possible to make hardware sales profitable instead of depending on Software Sales to pay for Hardware.
Nintendo got their profit margins, but nVIDIA was very upset at the meager profit margins.

nVIDIA is trying to steer Nintendo to using a higher end updated Orin Automotive SoC. We'll see which model Nintendo will land on. I'm betting on it being the bottom of the barrel SKU given how cheap Nintendo has historically been.

bit_user said:
I think they feel much more threatened by each other, to be honest. A PC, even one with a comparable APU, would always be more expensive than Playstation and XBox due to the way MS and Sony have cost-optimized them and sell the hardware almost at-cost.

That's the threat they know & understand. They are always worried about each other every generation.
It's PC gaming that's the perpetual threat in the shadows. PC gamers user base has been growing over time. Now to the point where we're "Undeniable" as a it's own platform.

bit_user said:
I don't really follow the gaming world, so I'm in no position to disagree but curious why you think that.

Each Console makers prioritizes their proprietary closed console platform.
Historically, they haven't been very receptive to PC gaming, despite the fact that PC gaming is HUGE.
It's only been very recently, with the massive PC install base that they realized that PC gaming can't be ignored.
That's why we're seeing more PC ports of console games, they realize that we aren't a threat to their existing install base. That the console Audience & PC Gaming Audience doesn't really overlap.

InvalidError · Apr 18, 2023

bit_user said:
I think you're still making some bad assumptions, rather than speaking from actual knowledge.

I'm not going to deep-dive into this as I do not have any particular interest in DX development and we are already 100 miles off-topic. Just saying there should be several opportunities for shortcuts.

BTW, MTRRs have been mostly superseded by Page Attribute Tables since all the way back to the P3.

bit_user said:
Informally, their biggest threat is to switch to another SoC maker. But... I mean, Intel hasn't exactly made the greatest showing with its dGPUs

A few people have re-reviewed the A750 or A770 with the April driver update, looks like the sore spots are clearing up nicely. If Intel wanted to get some of that Sony/Microsoft console SoC action, I'm sure they could work it out. Much easier to do so when there is only one or two standardized platform configurations for everyone from SoC designer to end-users to worry about.

Kamen Rider Blade said:
I wonder if any of the console makers have some sort of leverage to hold AMD back from making a big APU?
Because they're the ones that might feel "Threatened" if AMD made something that nice.

Personally, I think they're being paranoid about PC Gaming threatening Console gaming.

I think AMD would be far more concerned about cannibalizing its lower-end dGPU sales by being too generous with its IGPs. For a given amount of graphics performance, AMD may not be able to extract as much of a premium out of its large-IGP APUs as it gets from AIBs for dGPUs.

Annihilating a substantial chunk of the AIBs' business may not go well either.

bit_user · Apr 18, 2023

InvalidError said:
Just saying there should be several opportunities for shortcuts.

The key thing is what restrictions you're willing to accept, including how many hoops you have to jump through, to do it. The issues broadly break down into the following categories:

Security
Cache-coherence
Interactions with the kernel's VM subsystem

With newer iGPUs, they might've sufficiently address the cache-coherence problem, although this comes at some cost. GPUs have an incredibly weak memory model, and for good reasons.

Security and VM interactions are probably also addressed in recent iGPU generations, by simply having the GPU go through the same MMU as the CPU cores. This also lets you use memory pages to extend the CPU's security model to code executing on the GPU.

It's because these are fairly recent developments that Microsoft is only adding this now.

InvalidError said:
A few people have re-reviewed the A750 or A770 with the April driver update, looks like the sore spots are clearing up nicely. If Intel wanted to get some of that Sony/Microsoft console SoC action, I'm sure they could work it out. Much easier to do so when there is only one or two standardized platform configurations for everyone from SoC designer to end-users to worry about.

If you're building a console, you want the most performance per mm^2 of silicon. Intel still has a long way to go, on that front, and I'm sure that's not simply a matter of "drivers".

InvalidError · Apr 18, 2023

bit_user said:
If you're building a console, you want the most performance per mm^2 of silicon. Intel still has a long way to go, on that front, and I'm sure that's not simply a matter of "drivers".

For sqmm for sqmm performance, the A750 is doing fine: roughly twice the raw performance of an RX6600 in ~70% more space which includes an x16 PCIe interface and 256bits memory controller. Intel is just struggling with consistently wringing out its potential. Having a performance spread that ranges from getting beat senseless by the much slower 3050 to comfortably beating the RTX3060 as it ought to by raw numbers is plain silly. I doubt this would be an issue on consoles where developers have only one architecture to worry about in any given build.

bit_user · Apr 18, 2023

InvalidError said:
For sqmm for sqmm performance, the A750 is doing fine: roughly twice the raw performance of an RX6600 in ~70% more space

I meant on the same process node, which they're not.

The A770 is 406 mm^2, which translates to between 488 mm^2 and 461 mm^2 equivalent N7 die size. The RX 6650 XT die is 237 mm^2. So, that's between 95% and 106% more die space.

Regarding performance, the latest reviews I could find were from March 7th. The RX 6650 XT is 5.2% faster at 1080p and 2.3% slower at 1440p.

Source: https://www.notebookcheck.net/Notic...rage-fps-in-DirectX-11-12-games.699187.0.html

So, if we call performance about equivalent, then Intel is off by about about 2x in performance per transistor. You think either MS or Sony is going to take that kind of risk, while the other one sticks with AMD? I don't.

Not to mention they also have to provide consumers with a compelling reason to upgrade from the previous generation.

Finally, both MS and Sony have invested a lot in optimizing their GPU stuff for AMD. They'd have to start mostly from scratch, if switching to Intel.

If you have newer benchmarks, please share.

InvalidError · Apr 18, 2023

bit_user said:
So, if we call performance about equivalent, then Intel is off by about about 2x in performance per transistor. You think either MS or Sony is going to take that kind of risk, while the other one sticks with AMD? I don't.

I'm going by FP32 numbers - what the A750/770 should be capable of if Intel got its drivers fully sorted out, not benchmarks since many of those change by 5-2000% every month.

bit_user · Apr 18, 2023

InvalidError said:
I'm going by FP32 numbers - what the A750/770 should be capable of if Intel got its drivers fully sorted out, not benchmarks since many of those change by 5-2000% every month.

I think that's a mistake. We've seen many examples where FLOPS don't translate into real-world performance, and there's some good evidence it's not just "drivers" holding them back.

For instance, the RX 7900 XTX has 2.41x as much theoretical fp32 compute as the RX 6950 XT (non-boost; with boost clocks, the difference is even greater), but delivers nowhere near that multiple of performance.

It's not just AMD, either. Nvidia's RTX 3090 Ti has 2.85x as much theoretical fp32 compute as the RTX 2080 Ti (non-boost; comparing boost clocks, it's 2.97x as much), but delivers a mere 1.3x to 1.4x the performance.

So, you really can't treat theoretical fp32 numbers as particularly useful performance proxy.

bit_user · Apr 18, 2023

Kamen Rider Blade said:
MS' ARM push has been a joke.
Nobody buys Windows for "ARM".
The install base for "Windows on ARM" suck compared to the MASSIVE x86 library that has been built up over the years and the support that x86 has received.

I think it depends a lot on ARM being able to deliver a decent Windows experience on lower-cost hardware than AMD or Intel. It's similar to why ARM made such inroads into the Chromebook market.

Qualcomm has different ideas. They have some fantasy that wealthy executives will "simply demand" a Qualcomm-powered $2k+ ultrabook, for its light weight, 5G connectivity, and long battery life. So far, that doesn't seem to be playing out very well.

And the trouble is that Qualcomm had some sort of exclusivity agree with Microsoft, so nobody else is in the mix. I think you need at least MediaTek, but probably also Samsung or somebody else making those SoCs.

Kamen Rider Blade · Apr 18, 2023

InvalidError said:
I think AMD would be far more concerned about cannibalizing its lower-end dGPU sales by being too generous with its IGPs. For a given amount of graphics performance, AMD may not be able to extract as much of a premium out of its large-IGP APUs as it gets from AIBs for dGPUs.

Annihilating a substantial chunk of the AIBs' business may not go well either.

That is true, not stepping on their AIB's toes has been good for AMD.

Kamen Rider Blade · Apr 18, 2023

InvalidError said:
I'm going by FP32 numbers - what the A750/770 should be capable of if Intel got its drivers fully sorted out, not benchmarks since many of those change by 5-2000% every month.

But they're behind AMD & nVIDIA's driver teams by decades of experience and work.
So they still have ALOT of ground to catch up on.

News Jim Keller Shares Zen 5 Performance Projections

Distinguished

Distinguished

Titan

Titan

Titan

Titan

Titan

Distinguished

Titan

Titan

Titan

Titan

Titan

Distinguished

Titan

Distinguished

Titan

Titan

Titan

Titan

Titan

Titan

Titan

Distinguished

Distinguished

Share this page