News PCBye: Researchers Want to Ditch the Motherboard

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
As for how I arrived at my reduction, simple: once CPUs have enough on-package RAM for baseline use, enough on-package GPU for baseline use, then the minimum viable system size will be reduced to little more than CPU package size. Same for servers.
The most in-package HBM2 of which I'm aware is currently 32 GiB. How are you going to get from that to the 2-3 TiB that's currently supported by server CPUs? You're off by at least a factor of 32, and Intel is currently working to push the software limits on physical address space from 64 TiB to 4 PiB.

https://www.phoronix.com/scan.php?page=news_item&px=Linux-Default-5-LVL-Paging-Def

Moreover, I'm skeptical that a majority of server operators even want to buy machines without the ability to do memory upgrades or reconfiguration. Even if you don't normally do it, there's a big gulf between that and foreclosing the option.
 
The most in-package HBM2 of which I'm aware is currently 32 GiB. How are you going to get from that to the 2-3 TiB that's currently supported by server CPUs?
First, that 32GiB "limit" is if you limit yourself to four 8GiB stacks per package using 8Gbits dies and a maximum stack height of eight single-layer DRAMs. If you 3D-stack RAM on or under CPU/GPU dies, then you can have however as many HBM stacks as you have CPUs/GPUs in the package. A different memory stack design could accommodate more than eight layers. Future DRAM will have higher density with 16Gbits being the current highest and 32+Gbits likely in the future. Using slightly modified current tech, it should be possible to cram 100+GB of RAM under the IHS if you really wanted to.

Second, typical servers don't need full speed access to the whole multi-TB address space, they only need high speed access to the current working sets, no different from how only a small subset of all data fits in L1/L2/L3 caches. Intel's PB-scale address space is primarily intended for NVDIMM persistent in-memory databases to cut off HDD/SSD access time and in this case, only a very tiny fraction is ever in use at any given time. You don't need on-package-speed direct interconnect for those, only links fast enough to keep up with the peak transaction volume. I doubt many people or companies would be interested in having a maintenance/consumable item such as storage welded to their super expensive CPU.
 
If you 3D-stack RAM on or under CPU/GPU dies, then you can have however as many HBM stacks as you have CPUs/GPUs in the package.
If that were practical, then why didn't they do it? What of the overhead (in power, DRAM area, and yield) of tunneling all of the CPU's other I/Os and power through that stack? Cooling also seems like a potential issue, with that approach. In my mind, it's far from a given that this will work for server CPUs.

A different memory stack design could accommodate more than eight layers. Future DRAM will have higher density with 16Gbits being the current highest and 32+Gbits likely in the future. Using slightly modified current tech, it should be possible to cram 100+GB of RAM under the IHS if you really wanted to.
Okay, let's say you get 8 stacks x 8-high x 32 GBits. That's still only 256 GiB, or 1/4th of Intel's limit for the 8280. However, there are still enough people going beyond that, they can charge $3k more for the 8280L that raises the limit to 2 TiB, or $7k more for the 8280M that goes up to 4.5 TiB.

And again, there's no reason this couldn't apply to the Si-IF approach.

But here's the other thing - we don't even know if they were counting RAM. Maybe they were still planning on RAM being slotted into a daughter card or something. If so, then any wins you get by stacking RAM next to the CPU dies were independent and complementary to whatever they were talking about.

Lastly, your argument is entirely focused on area & cost, leaving aside their claimed bandwidth, latency, energy, and cooling benefits. By sticking with a traditional package, you're only getting that benefit for what RAM you can squeeze in there.
 
Last edited:
If that were practical, then why didn't they do it?
Simple: being able to directly stack stuff requires coordinated die sizes and pad-out. The market isn't desperate enough for the extra performance stacking RAM could bring to make the commitments yet, can still make-do with off-the-shelf parts (ex.: HBM) using interposers to make them work together. Once silicon hits the ~5nm brick wall, even tighter integration will become necessary to drive performance any further.