News Marvell develops custom HBM solution that offers higher performance in a smaller physical space

Admin · Dec 11, 2024

Marvell teams up with leading DRAM makers for customizable HBM solutions aimed at custom XPUs tailored for specific performance, power, and total cost of ownership

Marvell develops custom HBM solution that offers higher performance in a smaller physical space : Read more

abufrejoval · Dec 11, 2024

That's potentially a very smart money move, perhaps also technologically!

It must have been irking both the HBM and the ASIC makers that only NVidia was making so much money, when the main cost driver and performance enabler was the HBM.

So if they can make an AI ASIC at break-through cost via proprietary memory technology, they can stitch up buyers and keep the GPU competitors out.

Of course they could still just wind up in a blind alley, but at least they have the balls.

bit_user · Dec 11, 2024

I sometimes wonder how much more performance is possible, if you ditch traditional DRAM interface protocols and use a lower-level interface. Perhaps that's the main advantage PIM (processing in memory) solutions are getting, but I wonder if it really needs to be limited to PIM. So, perhaps those are a lot of the gains they're reaping?

abufrejoval · Dec 11, 2024

bit_user said:
I sometimes wonder how much more performance is possible, if you ditch traditional DRAM interface protocols and use a lower-level interface. Perhaps that's the main advantage PIM (processing in memory) solutions are getting, but I wonder if it really needs to be limited to PIM. So, perhaps those are a lot of the gains they're reaping?

DRAM is very low-level, there isn't layers of abstraction you can eliminate.

But in its original form its one bit per chip and cycle out of a matrix that got bigger and bigger.

And all speedups since have been about getting groups of bits instead of single ones or reusing active row-buffers when that doesn't change from one access to the next.

HBM has been about stacking DRAM chips and going ultra wide to get them all in parallel for bandwidth, PIM is about doing some processing on the HBM (or HMC!) base die carrier where signals from the varies dies can still arrive in parallel and because that die carrier could have transistors etched into it basically for free. The width and signal lengths towards the host would then much more constrained because of physics and the nature of the RAM sockets as being what they are.

The main interest is that you can do some of this PIM without touching the CPUs, simply by turning address lines into op-codes and/or even downloading deterministic finite state machine configuration data to a reserved address space for more programmability.

DRAM row buffers could become very wide ALUs if you use more than one and then have them operations between them with full row parallelism etc.

Other more radical variants of PIM employ RAM that actually includes some degree of boolean or finite state logic, Micron once presented a really interesting variant of that some years ago to Bull, which included me at the time.

But that stuff wasn't particularly suited to AI or LLMs, while there is lots of actors who try to do exactly that. Some names have popped up for more than a decade every now and then, none have succesful products at scale so far.

Coming back to your original point: the aim is mostly to do exactly the opposite, put more abstractions into the RAM chips themselves and adding layers which enabled you to extract results rather than raw data.

bit_user · Dec 11, 2024

abufrejoval said:
DRAM is very low-level, there isn't layers of abstraction you can eliminate.

It still limits you to accessing one row at a time, for instance. Could there be multiple row buffers, to enable greater concurrency? Does only one bank have to be active at a time?

abufrejoval said:
Coming back to your original point: the aim is mostly to do exactly the opposite, put more abstractions into the RAM chips themselves and adding layers which enabled you to extract results rather than raw data.

That's PIM, though. Not what Marvell is talking about.

I think a better move is to stack DRAM on more general-purpose compute dies, giving you PIM-like efficiency with almost the same level of generality we enjoy today.

abufrejoval · Dec 11, 2024

bit_user said:
It still limits you to accessing one row at a time, for instance. Could there be multiple row buffers, to enable greater concurrency? Does only one bank have to be active at a time?

That's what I also imagined first and discussed a bit with colleagues from bull and the CEA before the AI boom hit.
And it's what I tried to describe with that super wide ALU doing row-by-row computations

What you could do there is things like weight accumulations but you have to go towards something much more fine grained to be more generic. It's what Micron's automata processor wanted to do, which I believe I've mentioned before.

Upmem has been at it also, perhaps for 20 yeary by now, but unless I look for them, I can't see them topping headlines ever, even if it always looks as if they started having a product ready to use last month.

bit_user said:
That's PIM, though. Not what Marvell is talking about.

Yes Marvell is talking, but not saying anything yet. So it's hard to compare. For me it's really just the fact that they are trying to drill into that HBM fortress, and they are doing it with the perspective of a company which designs AI ASICs for Google and others.

That's a lot like Nvidia starting on a new memory technology instead of trying to break into Windows laptops.

Just the announcement in itself is remarkable because of who is involved, not what they say.

And you have to remember that people could not buy Blackwells for nearly all of 2024, because they had all been allocated to OpenAI and a few others, who had already signed contracts with NVidia.

Yet the bottleneck on numbers wasn't TMSC or the NVidia chips, it was the HBM memory chips or the limited capacity for their assembly which restricted the market and drove the numbers.

So why was NVidia the only one making a killing if they weren't even producing the most precious resource?

That's what Marvell targets and it really only needs to be as good as HBM and more freely available to burst some bubbles. If it is better, too, the market might shift significantly.

bit_user said:
I think a better move is to stack DRAM on more general-purpose compute dies, giving you PIM-like efficiency with almost the same level of generality we enjoy today.

However you break the memory wall without breaking the bank is good.

Search

News Marvell develops custom HBM solution that offers higher performance in a smaller physical space

Admin

Administrator

abufrejoval

Honorable

bit_user

Titan

abufrejoval

Honorable

bit_user

Titan

abufrejoval

Honorable

TRENDING THREADS

Latest posts

Share this page