News AMD Submits Patent for big.LITTLE-esque Hybrid Computing Implementation

PCWarrior

Distinguished
May 20, 2013
216
101
18,770
I look forward to the AMD fbs claiming that this is AMD's super-duper innovation, done for the first time ever by AMD and that everyone else is copying AMD. Like with the MCM/chiplet approach which was previously done by Intel 10 times: Intel Pentium Pro, Pentium D Presler, Xeon Dempsey, Xeon Clovertown, Core 2 Quad (Kentsfield, Penryn-QC and Yorkfield), Clarkdale, Arrandale, and Haswell-H. And when Intel decided to use MCM again for Kabylake G, Lakefield and Cooperlake it was supposedly a case of Intel copying AMD (disregarding the long history of Intel using MCM). I still remember like yesterday when AMD and their fbs were fighting MCM calling Intel that they are stitching dies together. And when Intel dared to say the same about Ryzen MCM (in an internal presentation mind you to highlight the performance benefits of monolithic dies) they got offended and didn't stop brigning up how supposedly now Intel is gluing dies with Cooper lake and Lakefield. This level of irony and hypocrisy never ceases to amaze me.
 
  • Like
Reactions: barryv88
I look forward to the AMD fbs claiming that this is AMD's super-duper innovation, done for the first time ever by AMD and that everyone else is copying AMD. Like with the MCM/chiplet approach which was previously done by Intel 10 times: Intel Pentium Pro, Pentium D Presler, Xeon Dempsey, Xeon Clovertown, Core 2 Quad (Kentsfield, Penryn-QC and Yorkfield), Clarkdale, Arrandale, and Haswell-H. And when Intel decided to use MCM again for Kabylake G, Lakefield and Cooperlake it was supposedly a case of Intel copying AMD (disregarding the long history of Intel using MCM). I still remember like yesterday when AMD and their fbs were fighting MCM calling Intel that they are stitching dies together. And when Intel dared to say the same about Ryzen MCM (in an internal presentation mind you to highlight the performance benefits of monolithic dies) they got offended and didn't stop brigning up how supposedly now Intel is gluing dies with Cooper lake and Lakefield. This level of irony and hypocrisy never ceases to amaze me.

Theres a huge difference between mcm packages and the use of interposers which AMD was the first to use. They aren't even in the same ball park.

Then intel attacked AMD claiming they were gluing chips together when the 3000 series was introduced.

Intel has their own technology in Feveros now. Intel claims it's superior. However rumors are the ring bus coherency between chip caches is a nightmare. This is what was a big issue with ccx on the 3000 series. But supposedly mostly solved with 4000 series. And I think I have an idea how they did it.

The big little design is an interesting layout in feveros. The little cores dont directly align with the big cores over them. So straight interconnect stacking of big-small isn't a thing. It has to be some hybrid ring bus design.
 
  • Like
Reactions: alextheblue
This whole hybrid computing is pointless unless you are severely TDP constrained think tables or smaller. You are simply one full node of process improvement away from having all large cores . The SoC fragmentation this causes is going to be a complete pain especially if everyone does there own hybrid computing approach. I have no doubt Microsoft would screw up the kernel trying to deal with these differences. I'm not a fan of this at all. For example we will see Intel's Tiger Lake get dominated by AMD's next gen APU's(Zen3/RDNA2) later in 2021 because of the better process and all large cores.
 
Aug 10, 2020
1
0
10
A recently filed AMD patent suggests the company is researching a big.LITTLE-esque hybrid architecture that employs big, fast cores paired with smaller cores that improve power efficiency.

AMD Submits Patent for big.LITTLE-esque Hybrid Computing Implementation : Read more

I would be surprised if this could stand up to any review. Why, for decades, PCB systems have had multiple CPUs spread across the system/board to perform processing. Their have even been CPU/GPU and GPU/GPU processing setups to help offload processing to other components. Too little shown but it would be interesting if/when Intel determines they need to challenges this...
 

Chung Leong

Reputable
Dec 6, 2019
494
193
4,860
Is this comparable to what ARM and Intel now do?

The key innovation here is the use of the instruction stream to trigger transitions. I think the technique is mainly designed for servers, where you can't rely on external signals to infer which core is the proper one at a given moment. On a phone or a laptop, there're foreground and background threads. There're threads that draw to the screen and those that don't. There're also different power conditions. Based on these variables, you can build an effective heuristic. On a server, you have none of these.
 

spongiemaster

Admirable
Dec 12, 2019
2,345
1,323
7,560
Intel has their own technology in Feveros now. Intel claims it's superior. However rumors are the ring bus coherency between chip caches is a nightmare. This is what was a big issue with ccx on the 3000 series. But supposedly mostly solved with 4000 series. And I think I have an idea how they did it.

Foveros is not a competitor to Infinity Fabric. Foveros is for die stacking. AMD's competing logic stacking technology is called X3D which was announced earlier this year. Intel's version of Infinity Fabric is EMIB which hit the market in 2016, before AMD released Ryzen. That's pretty much the point the poster you quoted was making. AMD fanbase likes to claim AMD was first to market with everything and the truth is, they rarely are. Not never, but quite rarely.
 
This whole hybrid computing is pointless unless you are severely TDP constrained think tables or smaller. You are simply one full node of process improvement away from having all large cores .
If the little cores perform better than the HTT of the big cores there could be benefit even for high TDP desktop parts.
I'm not saying it will I'm just saying there might be some things were it might make sense.
I have no doubt Microsoft would screw up the kernel trying to deal with these differences.
Would it need any change from what the kernel already does for HTT?
Replace HTT with little cores and you are done?!
Windows already works on a priority basis for threads so higher priority threads on real or big cores lower priority ones on HTT or little cores.
 
  • Like
Reactions: alextheblue

TJ Hooker

Titan
Ambassador
Intel's version of Infinity Fabric is EMIB which hit the market in 2016, before AMD released Ryzen. That's pretty much the point the poster you quoted was making. AMD fanbase likes to claim AMD was first to market with everything and the truth is, they rarely are. Not never, but quite rarely.
I would argue that a silicon interposer is the competitor/alternative to EMIB, or at least the original competitor, rather than Infinity Fabric.

"Intel has been shipping its EMIB (Embedded Multi-die Interconnect Bridge), a low-cost alternative to interposers, since 2017, and it also plans to bring that chiplet strategy to its mainstream chips."
Link

I believe the first commercial/consumer products to use an interposer to connect two dies on the same package/substrate was the R9 Fury (X) (2015). Starting with 1st gen Threadripper/EPYC (2017), AMD started to route the die-to-die interconnects on the substrate itself, without any additional silicon (interposer/EMIB). I'm not sure what exactly allowed this, whether improvements to substrate/packaging technology since the Fury came out, or maybe it's just a matter of how many connections are required, i.e. HBM requires too many to feasibly implement directly on the substrate.
 
Last edited:
  • Like
Reactions: alextheblue

spongiemaster

Admirable
Dec 12, 2019
2,345
1,323
7,560
I would argue that a silicon interposer is the competitor/alternative to EMIB, or at least the original competitor, rather than Infinity Fabric.

"Intel has been shipping its EMIB (Embedded Multi-die Interconnect Bridge), a low-cost alternative to interposers, since 2017, and it also plans to bring that chiplet strategy to its mainstream chips."
Link

Intel released the Stratix 10 FPGA, mentioned in your link, that used EMIB at the end of 2016.
I believe the first commercial/consumer products to use an interposer to connect two dies on the same package/substrate was the R9 Fury (X) (2015). Starting with 1st gen Threadripper/EPYC (2017), AMD started to route the die-to-die interconnects on the substrate itself, without any additional silicon (interposer/EMIB). I'm not sure what exactly allowed this, whether improvements to substrate/packaging technology since the Fury came out, or maybe it's just a matter of how many connections are required, i.e. HBM requires too many to feasibly implement directly on the substrate.
Sure, even if we go with interposer. AMD wasn't the first to use those either, so doesn't really change the point.
 
  • Like
Reactions: TJ Hooker

PCWarrior

Distinguished
May 20, 2013
216
101
18,770
Theres a huge difference between mcm packages and the use of interposers which AMD was the first to use. They aren't even in the same ball park.
First of all the use of silicon interposers (instead of the flipchip method for which Intel holds several patents) in multi-die packaging is an evolution, not a revolution and certainly doesn’t warrant someone a claim over the entire MCM concept as AMD’s fbs have done (who even say that anyone that uses MCM is copying AMD).

And here is an Anandtech article from back in 2006 saying this about Intel’s flipchip MCM approach:
We've shown in the past that there's no real-world performance penalty to this approach to manufacturing, and there are numerous benefits from Intel's perspective. Yields are improved by producing a two die quad-core processor rather than a single die. The approach also improves manufacturing flexibility since Intel can decide at a very late stage whether to produce a dual or quad core processor after a die is fabbed.”

And here is an article before that:
“While having two independent dual-core die isn't as fast as a single integrated quad-core die, we expect the performance penalty to be minimal in most applications (just as it was with the first dual core Intel chips). There are some benefits to using two independent dual-core die on a single package, which are highlighted above. They mostly relate to manufacturing and optimizing costs/production and thus won't be directly visible to the end user. If history repeats itself, we should expect Penryn (45nm) to be Intel's first single die quad-core desktop processor.”

So apparently, at the time, flipchip was more than enough for MCM - so why increase the cost with a silicon interposer?

Besides it was not AMD that was first to use silicon interposers anyway. The concept has existed since the 1990s. Micro Module Systems (a DEC spin-off) was amongst the first to use it commercially. From the major silicon providers, Xilinx was the first to use it commercially in 2011 (see here) in its FPGAs. More importantly, in the same year (2011) Intel was already publishing about EMIB (see here), which is a superior implementation of the 2.5D packaging concept as it avoids the use of TSVs (which are associated with several issues and limitations). In 2015, EMIB was incorporated to their FPGAs (Stratix 10, see here) – it was one of the reasons Altera partnered with Intel as their foundry before they finally got acquired by Intel.

Then Intel attacked AMD claiming they were gluing chips together when the 3000 series was introduced.
No that happened with the first gen Ryzen/Epyc and it was in an internal presentation. See here. In terms of performance monolithic will always be the golden standard. MCM is merely a clever engineering trick for more cost-efficient production and extend the capabilities of what your current process node allows you. But the fact of the matter is that when you can produce efficiently dies with large number of cores (like Intel can do) going MCM to extend the core count versus going multi-socket is an interesting debate. Because there is only so much heat that you can dissipate from a single chip (regardless whether the chip is an MCM one) from a single socket. Having multiple sockets allows you to provide more power overall and clock the cores higher compared to when they are all in the same socket. Sure inter-socket latency is higher than inter-die latency but when you already have 18-28 cores in one monolithic die in one socket, virtually all-latency sensitive workloads are taken care off by that single cpu. Those workloads that need/scale to even more cores are most likely not latency-sensitive so going multi-socket doesn’t really affect them.
 
First of all the use of silicon interposers (instead of the flipchip method for which Intel holds several patents) in multi-die packaging is an evolution, not a revolution and certainly doesn’t warrant someone a claim over the entire MCM concept as AMD’s fbs have done (who even say that anyone that uses MCM is copying AMD).

And here is an Anandtech article from back in 2006 saying this about Intel’s flipchip MCM approach:
We've shown in the past that there's no real-world performance penalty to this approach to manufacturing, and there are numerous benefits from Intel's perspective. Yields are improved by producing a two die quad-core processor rather than a single die. The approach also improves manufacturing flexibility since Intel can decide at a very late stage whether to produce a dual or quad core processor after a die is fabbed.”

And here is an article before that:
“While having two independent dual-core die isn't as fast as a single integrated quad-core die, we expect the performance penalty to be minimal in most applications (just as it was with the first dual core Intel chips). There are some benefits to using two independent dual-core die on a single package, which are highlighted above. They mostly relate to manufacturing and optimizing costs/production and thus won't be directly visible to the end user. If history repeats itself, we should expect Penryn (45nm) to be Intel's first single die quad-core desktop processor.”

So apparently, at the time, flipchip was more than enough for MCM - so why increase the cost with a silicon interposer?

Besides it was not AMD that was first to use silicon interposers anyway. The concept has existed since the 1990s. Micro Module Systems (a DEC spin-off) was amongst the first to use it commercially. From the major silicon providers, Xilinx was the first to use it commercially in 2011 (see here) in its FPGAs. More importantly, in the same year (2011) Intel was already publishing about EMIB (see here), which is a superior implementation of the 2.5D packaging concept as it avoids the use of TSVs (which are associated with several issues and limitations). In 2015, EMIB was incorporated to their FPGAs (Stratix 10, see here) – it was one of the reasons Altera partnered with Intel as their foundry before they finally got acquired by Intel.


No that happened with the first gen Ryzen/Epyc and it was in an internal presentation. See here. In terms of performance monolithic will always be the golden standard. MCM is merely a clever engineering trick for more cost-efficient production and extend the capabilities of what your current process node allows you. But the fact of the matter is that when you can produce efficiently dies with large number of cores (like Intel can do) going MCM to extend the core count versus going multi-socket is an interesting debate. Because there is only so much heat that you can dissipate from a single chip (regardless whether the chip is an MCM one) from a single socket. Having multiple sockets allows you to provide more power overall and clock the cores higher compared to when they are all in the same socket. Sure inter-socket latency is higher than inter-die latency but when you already have 18-28 cores in one monolithic die in one socket, virtually all-latency sensitive workloads are taken care off by that single cpu. Those workloads that need/scale to even more cores are most likely not latency-sensitive so going multi-socket doesn’t really affect them.

Because flip chip uses a standard substrate with standard conductors. This means communications are significantly slowed down. Once of the performance limiters was the fact the silicon had to be connected by wires which created speed limit/power/heat issues. Technically speaking you could call the Slot 2 chips MCM.

Silicon interposers expose silicon pads on a silicon base die. It's inherently faster because there are no intervening wires.

And this is why stop using the old flip chip design. And the irony is Intel really poked fun at AMD for the MCM implementation calling it "glued" chips. Yet here they are doing the same thing. It's all games and marketing and a lot of hubris pride.

I could care less how it's spun, as long as I get the best value. And you can't deny intel has been milking us for a while, while internally gutting themselves. Is it a surprise they are a mess right now?

She (Intel) no longer the prom queen. She's aged, overweight, lacking substance, and worst a snob. She needs to get over that, and learn some humility. (And they are getting it now with their market losses as the people have spoken with their wallets and market share.)

People would respect Intel more, provided they change their outward tone and image.
 
Last edited:
Because flip chip uses a standard substrate with standard conductors. This means communications are significantly slowed down. Once of the performance limiters was the fact the silicon had to be connected by wires which created speed limit/power/heat issues. Technically speaking you could call the Slot 2 chips MCM.

Silicon interposers expose silicon pads on a silicon base die. It's inherently faster because there are no intervening wires.

And this is why stop using the old flip chip design. And the irony is Intel really poked fun at AMD for the MCM implementation calling it "glued" chips. Yet here they are doing the same thing. It's all games and marketing and a lot of hubris pride.

I could care less how it's spun, as long as I get the best value. And you can't deny intel has been milking us for a while, while internally gutting themselves. Is it a surprise they are a mess right now?

She (Intel) no longer the prom queen. She's aged, overweight, lacking substance, and worst a snob. She needs to get over that, and learn some humility. (And they are getting it now with their market losses as the people have spoken with their wallets and market share.)

People would respect Intel more, provided they change their outward tone and image.

Yet you realize back in 2006 AMD made fun of Intel in the same way and touted a monolithic design in K10.
 
Fair point and the statement remains true. "It's all games and marketing and a lot of hubris pride" Either side is capable of it. 3dfx was particularly nasty with NVIDIA.

I typically prefer when companies don't do that and rather let their product show for itself but I have yet to see one that doesn't trash talk. Intel used to be better at it but times change I guess.
 
  • Like
Reactions: digitalgriffin