News Intel Panther Lake will allegedly reintegrate the memory controller into the compute tile — Nova Lake is expected to separate the two again with ad...

TheSecondPower

Distinguished
Nov 6, 2013
133
124
18,760
I remember hearing that Panther Lake will succeed Lunar Lake. Lunar Lake already shares one die for the memory controller and CPU cores and GPU and NPU. There is a separate I/O die though.

I think Broadwell-E used a mesh interconnect instead of a ring bus because it had too many cores for a ring bus. I'm guessing Intel's server CPUs use something similar. As I recall it wasn't great for latency.

I'm guessing AMD's solution is a little different. Zen 1 and 2 had clusters of 4 cores and since Zen 3 AMD has had clusters of 8 cores, where latency is low inside a cluster and a little high outside of it.
 

TheSecondPower

Distinguished
Nov 6, 2013
133
124
18,760
RDNA 3 is, IIRC, a GPU architecture, never to be confused with a CPU architecture...;) Big differences all around.
RDNA 3 and Zen 2 were both AMD products to move to the memory controller away from the compute units, like Arrow Lake and Meteor Lake do. However while this was largely successful for Zen 2 (because of huge advancements in other respects), it didn't go over so well for RDNA 3. Yes Arrow Lake and RDNA 3 are not alike since one is a CPU line and one a GPU line, but they have memory latency troubles in common.
 
  • Like
Reactions: jlake3
Oct 26, 2024
1
0
10
Lunar lake has memory controller on compute tile and platform controller tile. maybe this is why panther lake will not have soc tile ,since lunar lake is far more successful design (better latencies for caches ,core to core even faster than raptor lake. İ think panther lake basically will be more powerfully lunar lake so they give same names to tiles as lunar lake . 4p+8e on same ring to good performance ,and 4e on separate bus to maximize efficency at light workloads.12xe3 Intel 18a 10 percent IPC for p core . Dream cpu
 

bit_user

Titan
Ambassador
I think Broadwell-E used a mesh interconnect instead of a ring bus because it had too many cores for a ring bus.
No, it was the last generation to use one or more rings (up to 2.5, IIRC). Skylake-SP was the first to use a mesh.

I'm guessing Intel's server CPUs use something similar. As I recall it wasn't great for latency.
Eh, server CPUs should really be designed to optimize latency under high utilization, which makes your typical latency benchmark pretty irrelevant. Meshes scale very well, at least until the point where they start crossing die boundaries (which burned Sapphire Rapids and is something Intel walked back in Emerald Rapids).

I'm guessing AMD's solution is a little different. Zen 1 and 2 had clusters of 4 cores and since Zen 3 AMD has had clusters of 8 cores, where latency is low inside a cluster and a little high outside of it.
Uh, I thought we were talking about memory latency. That matters a lot more than core-to-core.
 
Last edited:

bit_user

Titan
Ambassador
RDNA 3 and Zen 2 were both AMD products to move to the memory controller away from the compute units, like Arrow Lake and Meteor Lake do. However while this was largely successful for Zen 2 (because of huge advancements in other respects), it didn't go over so well for RDNA 3.
I'm not sure how harmful it really was for RDNA3's performance. Yes, it has higher L3 latencies than RDNA2, but it also scaled L3 bandwidth quite significantly.

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6021c282-ada0-45e6-8bfe-5108afb8cc9a_1408x583.png


https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6a58a68-4006-4e6a-a4e3-5a4792ed3036_1411x586.png


Source: https://chipsandcheese.com/p/microbenchmarking-amds-rdna-3-graphics-architecture

GPUs love bandwidth and are a lot more tolerant of latency than CPUs. The only way to really know how much it hurt them is by profiling actual games and seeing to what extent shader occupancy is being constrained by L2 misses vs. RDNA2.
 
  • Like
Reactions: Peksha and P.Amini

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,419
944
20,060
Am I the only one wondering WTF with Intel?

I can understand using a "Tile-Based Architecture" for DeskTop / WorkStation.

Why not just go with Monolithic for LapTop parts, just like AMD?

When Power / Efficiency / Thermals matter, just make it Monolithic.

Keep your Design in check and not make it too large like AMD does, and it should be fine.

All the excess Power/Heat/Latency is more amenable in a DeskTop setting where you can benefit from the Tile/Chiplet designs inherent modularity.
 
  • Like
Reactions: Peksha and NinoPino

bit_user

Titan
Ambassador
Am I the only one wondering WTF with Intel?

I can understand using a "Tile-Based Architecture" for DeskTop / WorkStation.

Why not just go with Monolithic for LapTop parts, just like AMD?

When Power / Efficiency / Thermals matter, just make it Monolithic.
If you read their launch material around Meteor Lake, they were keen to highlight how they put all the essential blocks in the SoC tile (including 2x LPE cores), which enabled them to completely power down the GPU tile and CPU tile for light-duty usage. Meteor Lake did achieve impressive battery life in tasks like video calls, where it just had a minimal amount of compute + (hardware-accelerated) video encode/decode to do.

Also, Intel's packaging technology is more power-efficient than AMDs. So, it's much less of a downside for laptops than when AMD does it (e.g. Dragon Range).

Clearly, it had significant downsides for them. However, we should recognize that there was a certain logic behind the decision.
 

Kamen Rider Blade

Distinguished
Dec 2, 2013
1,419
944
20,060
If you read their launch material around Meteor Lake, they were keen to highlight how they put all the essential blocks in the SoC tile (including 2x LPE cores), which enabled them to completely power down the GPU tile and CPU tile for light-duty usage. Meteor Lake did achieve impressive battery life in tasks like video calls, where it just had a minimal amount of compute + (hardware-accelerated) video encode/decode to do.

Also, Intel's packaging technology is more power-efficient than AMDs. So, it's much less of a downside for laptops than when AMD does it (e.g. Dragon Range).

Clearly, it had significant downsides for them. However, we should recognize that there was a certain logic behind the decision.

But can't Intel do the same type of Power Gating when they're Monolithic for each of the ASIC's seperated IP section?

Wouldn't the benefits of Monolithic out-weigh having to deal with all the down-sides of Chiplet Tech?
 

bit_user

Titan
Ambassador
But can't Intel do the same type of Power Gating when they're Monolithic for each of the ASIC's seperated IP section?
I honestly don't know how much more effective it is to power down an entire die that just gating all of the corresponding parts.

Wouldn't the benefits of Monolithic out-weigh having to deal with all the down-sides of Chiplet Tech?
On balance, it would seem Meteor Lake went too far, given which way they went with Lunar Lake and seem to be going with Panther Lake.

It reminds me a little of how they overdid chiplets in Ponte Vecchio, which had somewhere around 50 different chiplets, packed and stacked in there. I wonder who in their right mind thought it was a good idea to lean so heavily into relatively new tech, like that. If you look at how AMD does things, it's very incremental and I'm sure they learn a lot in each generation.
 
  • Like
Reactions: Peksha and P.Amini