News AMD Submits Patent for big.LITTLE-esque Hybrid Computing Implementation

Admin · Aug 8, 2020

A recently filed AMD patent suggests the company is researching a big.LITTLE-esque hybrid architecture that employs big, fast cores paired with smaller cores that improve power efficiency.

AMD Submits Patent for big.LITTLE-esque Hybrid Computing Implementation : Read more

NightHawkRMX · Aug 9, 2020

Is this comparable to what ARM and Intel now do?

jimmysmitty · Aug 9, 2020

NightHawkRMX said:
Is this comparable to what ARM and Intel now do?

Similar idea. No idea on the implementation. Intels Forevos is pretty advanced (die stacking) so I would think AMD would be looking for a similar technique.

alextheblue · Aug 9, 2020

jimmysmitty said:
Similar idea. No idea on the implementation. Intels Forevos is pretty advanced (die stacking) so I would think AMD would be looking for a similar technique.

Intel's current implementation requires ISA parity though... which harms the performance of the larger cores.

TerryLaze · Aug 9, 2020

jimmysmitty said:
Similar idea. No idea on the implementation. Intels Forevos is pretty advanced (die stacking) so I would think AMD would be looking for a similar technique.

It's foveros (the frightening one) you barbar. 😛
Also do you really need stacking for big.little,you can have them seperate just like any CCX, or not?

PCWarrior · Aug 9, 2020

I look forward to the AMD fbs claiming that this is AMD's super-duper innovation, done for the first time ever by AMD and that everyone else is copying AMD. Like with the MCM/chiplet approach which was previously done by Intel 10 times: Intel Pentium Pro, Pentium D Presler, Xeon Dempsey, Xeon Clovertown, Core 2 Quad (Kentsfield, Penryn-QC and Yorkfield), Clarkdale, Arrandale, and Haswell-H. And when Intel decided to use MCM again for Kabylake G, Lakefield and Cooperlake it was supposedly a case of Intel copying AMD (disregarding the long history of Intel using MCM). I still remember like yesterday when AMD and their fbs were fighting MCM calling Intel that they are stitching dies together. And when Intel dared to say the same about Ryzen MCM (in an internal presentation mind you to highlight the performance benefits of monolithic dies) they got offended and didn't stop brigning up how supposedly now Intel is gluing dies with Cooper lake and Lakefield. This level of irony and hypocrisy never ceases to amaze me.

oGudNite · Aug 9, 2020

As spotted by patent sleuth @Underfox3, AMD has field a patent for a technique... I believe you word correction might have made an error

digitalgriffin · Aug 9, 2020

PCWarrior said:
I look forward to the AMD fbs claiming that this is AMD's super-duper innovation, done for the first time ever by AMD and that everyone else is copying AMD. Like with the MCM/chiplet approach which was previously done by Intel 10 times: Intel Pentium Pro, Pentium D Presler, Xeon Dempsey, Xeon Clovertown, Core 2 Quad (Kentsfield, Penryn-QC and Yorkfield), Clarkdale, Arrandale, and Haswell-H. And when Intel decided to use MCM again for Kabylake G, Lakefield and Cooperlake it was supposedly a case of Intel copying AMD (disregarding the long history of Intel using MCM). I still remember like yesterday when AMD and their fbs were fighting MCM calling Intel that they are stitching dies together. And when Intel dared to say the same about Ryzen MCM (in an internal presentation mind you to highlight the performance benefits of monolithic dies) they got offended and didn't stop brigning up how supposedly now Intel is gluing dies with Cooper lake and Lakefield. This level of irony and hypocrisy never ceases to amaze me.

Theres a huge difference between mcm packages and the use of interposers which AMD was the first to use. They aren't even in the same ball park.

Then intel attacked AMD claiming they were gluing chips together when the 3000 series was introduced.

Intel has their own technology in Feveros now. Intel claims it's superior. However rumors are the ring bus coherency between chip caches is a nightmare. This is what was a big issue with ccx on the 3000 series. But supposedly mostly solved with 4000 series. And I think I have an idea how they did it.

The big little design is an interesting layout in feveros. The little cores dont directly align with the big cores over them. So straight interconnect stacking of big-small isn't a thing. It has to be some hybrid ring bus design.

JamesSneed · Aug 10, 2020

This whole hybrid computing is pointless unless you are severely TDP constrained think tables or smaller. You are simply one full node of process improvement away from having all large cores . The SoC fragmentation this causes is going to be a complete pain especially if everyone does there own hybrid computing approach. I have no doubt Microsoft would screw up the kernel trying to deal with these differences. I'm not a fan of this at all. For example we will see Intel's Tiger Lake get dominated by AMD's next gen APU's(Zen3/RDNA2) later in 2021 because of the better process and all large cores.

BillM1 · Aug 10, 2020

Admin said:
A recently filed AMD patent suggests the company is researching a big.LITTLE-esque hybrid architecture that employs big, fast cores paired with smaller cores that improve power efficiency.

AMD Submits Patent for big.LITTLE-esque Hybrid Computing Implementation : Read more

I would be surprised if this could stand up to any review. Why, for decades, PCB systems have had multiple CPUs spread across the system/board to perform processing. Their have even been CPU/GPU and GPU/GPU processing setups to help offload processing to other components. Too little shown but it would be interesting if/when Intel determines they need to challenges this...

Chung Leong · Aug 10, 2020

NightHawkRMX said:
Is this comparable to what ARM and Intel now do?

The key innovation here is the use of the instruction stream to trigger transitions. I think the technique is mainly designed for servers, where you can't rely on external signals to infer which core is the proper one at a given moment. On a phone or a laptop, there're foreground and background threads. There're threads that draw to the screen and those that don't. There're also different power conditions. Based on these variables, you can build an effective heuristic. On a server, you have none of these.

spongiemaster · Aug 10, 2020

digitalgriffin said:
Intel has their own technology in Feveros now. Intel claims it's superior. However rumors are the ring bus coherency between chip caches is a nightmare. This is what was a big issue with ccx on the 3000 series. But supposedly mostly solved with 4000 series. And I think I have an idea how they did it.

Foveros is not a competitor to Infinity Fabric. Foveros is for die stacking. AMD's competing logic stacking technology is called X3D which was announced earlier this year. Intel's version of Infinity Fabric is EMIB which hit the market in 2016, before AMD released Ryzen. That's pretty much the point the poster you quoted was making. AMD fanbase likes to claim AMD was first to market with everything and the truth is, they rarely are. Not never, but quite rarely.

TerryLaze · Aug 10, 2020

JamesSneed said:
This whole hybrid computing is pointless unless you are severely TDP constrained think tables or smaller. You are simply one full node of process improvement away from having all large cores .

If the little cores perform better than the HTT of the big cores there could be benefit even for high TDP desktop parts.
I'm not saying it will I'm just saying there might be some things were it might make sense.

JamesSneed said:
I have no doubt Microsoft would screw up the kernel trying to deal with these differences.

Would it need any change from what the kernel already does for HTT?
Replace HTT with little cores and you are done?!
Windows already works on a priority basis for threads so higher priority threads on real or big cores lower priority ones on HTT or little cores.

TJ Hooker · Aug 10, 2020

spongiemaster said:
Intel's version of Infinity Fabric is EMIB which hit the market in 2016, before AMD released Ryzen. That's pretty much the point the poster you quoted was making. AMD fanbase likes to claim AMD was first to market with everything and the truth is, they rarely are. Not never, but quite rarely.

I would argue that a silicon interposer is the competitor/alternative to EMIB, or at least the original competitor, rather than Infinity Fabric.

"Intel has been shipping its EMIB (Embedded Multi-die Interconnect Bridge), a low-cost alternative to interposers, since 2017, and it also plans to bring that chiplet strategy to its mainstream chips."
Link

I believe the first commercial/consumer products to use an interposer to connect two dies on the same package/substrate was the R9 Fury (X) (2015). Starting with 1st gen Threadripper/EPYC (2017), AMD started to route the die-to-die interconnects on the substrate itself, without any additional silicon (interposer/EMIB). I'm not sure what exactly allowed this, whether improvements to substrate/packaging technology since the Fury came out, or maybe it's just a matter of how many connections are required, i.e. HBM requires too many to feasibly implement directly on the substrate.

spongiemaster · Aug 11, 2020

TJ Hooker said:
I would argue that a silicon interposer is the competitor/alternative to EMIB, or at least the original competitor, rather than Infinity Fabric.

"Intel has been shipping its EMIB (Embedded Multi-die Interconnect Bridge), a low-cost alternative to interposers, since 2017, and it also plans to bring that chiplet strategy to its mainstream chips."
Link

Intel released the Stratix 10 FPGA, mentioned in your link, that used EMIB at the end of 2016.

I believe the first commercial/consumer products to use an interposer to connect two dies on the same package/substrate was the R9 Fury (X) (2015). Starting with 1st gen Threadripper/EPYC (2017), AMD started to route the die-to-die interconnects on the substrate itself, without any additional silicon (interposer/EMIB). I'm not sure what exactly allowed this, whether improvements to substrate/packaging technology since the Fury came out, or maybe it's just a matter of how many connections are required, i.e. HBM requires too many to feasibly implement directly on the substrate.

Sure, even if we go with interposer. AMD wasn't the first to use those either, so doesn't really change the point.

jimmysmitty · Aug 11, 2020

TerryLaze said:
It's foveros (the frightening one) you barbar. 😛
Also do you really need stacking for big.little,you can have them seperate just like any CCX, or not?

No they do not need to go that route but it is the better option. Typically lower latency would be the benefit although maybe not much.

PCWarrior · Aug 11, 2020

digitalgriffin said:
Theres a huge difference between mcm packages and the use of interposers which AMD was the first to use. They aren't even in the same ball park.

First of all the use of silicon interposers (instead of the flipchip method for which Intel holds several patents) in multi-die packaging is an evolution, not a revolution and certainly doesn’t warrant someone a claim over the entire MCM concept as AMD’s fbs have done (who even say that anyone that uses MCM is copying AMD).

And here is an Anandtech article from back in 2006 saying this about Intel’s flipchip MCM approach:
“We've shown in the past that there's no real-world performance penalty to this approach to manufacturing, and there are numerous benefits from Intel's perspective. Yields are improved by producing a two die quad-core processor rather than a single die. The approach also improves manufacturing flexibility since Intel can decide at a very late stage whether to produce a dual or quad core processor after a die is fabbed.”

And here is an article before that:
“While having two independent dual-core die isn't as fast as a single integrated quad-core die, we expect the performance penalty to be minimal in most applications (just as it was with the first dual core Intel chips). There are some benefits to using two independent dual-core die on a single package, which are highlighted above. They mostly relate to manufacturing and optimizing costs/production and thus won't be directly visible to the end user. If history repeats itself, we should expect Penryn (45nm) to be Intel's first single die quad-core desktop processor.”

So apparently, at the time, flipchip was more than enough for MCM - so why increase the cost with a silicon interposer?

Besides it was not AMD that was first to use silicon interposers anyway. The concept has existed since the 1990s. Micro Module Systems (a DEC spin-off) was amongst the first to use it commercially. From the major silicon providers, Xilinx was the first to use it commercially in 2011 (see here) in its FPGAs. More importantly, in the same year (2011) Intel was already publishing about EMIB (see here), which is a superior implementation of the 2.5D packaging concept as it avoids the use of TSVs (which are associated with several issues and limitations). In 2015, EMIB was incorporated to their FPGAs (Stratix 10, see here) – it was one of the reasons Altera partnered with Intel as their foundry before they finally got acquired by Intel.

digitalgriffin said:
Then Intel attacked AMD claiming they were gluing chips together when the 3000 series was introduced.

No that happened with the first gen Ryzen/Epyc and it was in an internal presentation. See here. In terms of performance monolithic will always be the golden standard. MCM is merely a clever engineering trick for more cost-efficient production and extend the capabilities of what your current process node allows you. But the fact of the matter is that when you can produce efficiently dies with large number of cores (like Intel can do) going MCM to extend the core count versus going multi-socket is an interesting debate. Because there is only so much heat that you can dissipate from a single chip (regardless whether the chip is an MCM one) from a single socket. Having multiple sockets allows you to provide more power overall and clock the cores higher compared to when they are all in the same socket. Sure inter-socket latency is higher than inter-die latency but when you already have 18-28 cores in one monolithic die in one socket, virtually all-latency sensitive workloads are taken care off by that single cpu. Those workloads that need/scale to even more cores are most likely not latency-sensitive so going multi-socket doesn’t really affect them.

digitalgriffin · Aug 11, 2020

PCWarrior said:
First of all the use of silicon interposers (instead of the flipchip method for which Intel holds several patents) in multi-die packaging is an evolution, not a revolution and certainly doesn’t warrant someone a claim over the entire MCM concept as AMD’s fbs have done (who even say that anyone that uses MCM is copying AMD).

And here is an Anandtech article from back in 2006 saying this about Intel’s flipchip MCM approach:
“We've shown in the past that there's no real-world performance penalty to this approach to manufacturing, and there are numerous benefits from Intel's perspective. Yields are improved by producing a two die quad-core processor rather than a single die. The approach also improves manufacturing flexibility since Intel can decide at a very late stage whether to produce a dual or quad core processor after a die is fabbed.”

And here is an article before that:
“While having two independent dual-core die isn't as fast as a single integrated quad-core die, we expect the performance penalty to be minimal in most applications (just as it was with the first dual core Intel chips). There are some benefits to using two independent dual-core die on a single package, which are highlighted above. They mostly relate to manufacturing and optimizing costs/production and thus won't be directly visible to the end user. If history repeats itself, we should expect Penryn (45nm) to be Intel's first single die quad-core desktop processor.”

So apparently, at the time, flipchip was more than enough for MCM - so why increase the cost with a silicon interposer?

Besides it was not AMD that was first to use silicon interposers anyway. The concept has existed since the 1990s. Micro Module Systems (a DEC spin-off) was amongst the first to use it commercially. From the major silicon providers, Xilinx was the first to use it commercially in 2011 (see here) in its FPGAs. More importantly, in the same year (2011) Intel was already publishing about EMIB (see here), which is a superior implementation of the 2.5D packaging concept as it avoids the use of TSVs (which are associated with several issues and limitations). In 2015, EMIB was incorporated to their FPGAs (Stratix 10, see here) – it was one of the reasons Altera partnered with Intel as their foundry before they finally got acquired by Intel.

No that happened with the first gen Ryzen/Epyc and it was in an internal presentation. See here. In terms of performance monolithic will always be the golden standard. MCM is merely a clever engineering trick for more cost-efficient production and extend the capabilities of what your current process node allows you. But the fact of the matter is that when you can produce efficiently dies with large number of cores (like Intel can do) going MCM to extend the core count versus going multi-socket is an interesting debate. Because there is only so much heat that you can dissipate from a single chip (regardless whether the chip is an MCM one) from a single socket. Having multiple sockets allows you to provide more power overall and clock the cores higher compared to when they are all in the same socket. Sure inter-socket latency is higher than inter-die latency but when you already have 18-28 cores in one monolithic die in one socket, virtually all-latency sensitive workloads are taken care off by that single cpu. Those workloads that need/scale to even more cores are most likely not latency-sensitive so going multi-socket doesn’t really affect them.

Because flip chip uses a standard substrate with standard conductors. This means communications are significantly slowed down. Once of the performance limiters was the fact the silicon had to be connected by wires which created speed limit/power/heat issues. Technically speaking you could call the Slot 2 chips MCM.

Silicon interposers expose silicon pads on a silicon base die. It's inherently faster because there are no intervening wires.

And this is why stop using the old flip chip design. And the irony is Intel really poked fun at AMD for the MCM implementation calling it "glued" chips. Yet here they are doing the same thing. It's all games and marketing and a lot of hubris pride.

I could care less how it's spun, as long as I get the best value. And you can't deny intel has been milking us for a while, while internally gutting themselves. Is it a surprise they are a mess right now?

She (Intel) no longer the prom queen. She's aged, overweight, lacking substance, and worst a snob. She needs to get over that, and learn some humility. (And they are getting it now with their market losses as the people have spoken with their wallets and market share.)

People would respect Intel more, provided they change their outward tone and image.

jimmysmitty · Aug 11, 2020

digitalgriffin said:
Because flip chip uses a standard substrate with standard conductors. This means communications are significantly slowed down. Once of the performance limiters was the fact the silicon had to be connected by wires which created speed limit/power/heat issues. Technically speaking you could call the Slot 2 chips MCM.

Silicon interposers expose silicon pads on a silicon base die. It's inherently faster because there are no intervening wires.

And this is why stop using the old flip chip design. And the irony is Intel really poked fun at AMD for the MCM implementation calling it "glued" chips. Yet here they are doing the same thing. It's all games and marketing and a lot of hubris pride.

I could care less how it's spun, as long as I get the best value. And you can't deny intel has been milking us for a while, while internally gutting themselves. Is it a surprise they are a mess right now?

She (Intel) no longer the prom queen. She's aged, overweight, lacking substance, and worst a snob. She needs to get over that, and learn some humility. (And they are getting it now with their market losses as the people have spoken with their wallets and market share.)

People would respect Intel more, provided they change their outward tone and image.

Yet you realize back in 2006 AMD made fun of Intel in the same way and touted a monolithic design in K10.

digitalgriffin · Aug 11, 2020

jimmysmitty said:
Yet you realize back in 2006 AMD made fun of Intel in the same way and touted a monolithic design in K10.

Fair point and the statement remains true. "It's all games and marketing and a lot of hubris pride" Either side is capable of it. 3dfx was particularly nasty with NVIDIA.

jimmysmitty · Aug 12, 2020

digitalgriffin said:
Fair point and the statement remains true. "It's all games and marketing and a lot of hubris pride" Either side is capable of it. 3dfx was particularly nasty with NVIDIA.

I typically prefer when companies don't do that and rather let their product show for itself but I have yet to see one that doesn't trash talk. Intel used to be better at it but times change I guess.

News AMD Submits Patent for big.LITTLE-esque Hybrid Computing Implementation

Administrator

Titan

Champion

Distinguished

Titan

Distinguished

Splendid

Judicious

Reputable

Dignified

Titan

Titan

Dignified

Champion

Distinguished

Splendid

Champion

Splendid

Champion

Share this page