It still limits you to accessing one row at a time, for instance. Could there be multiple row buffers, to enable greater concurrency? Does only one bank have to be active at a time?
That's what I also imagined first and discussed a bit with colleagues from bull and the CEA before the AI boom hit.
And it's what I tried to describe with that super wide ALU doing row-by-row computations
What you could do there is things like weight accumulations but you have to go towards something much more fine grained to be more generic. It's what
Micron's automata processor wanted to do, which I believe I've mentioned before.
Upmem has been at it also, perhaps for 20 yeary by now, but unless I look for them, I can't see them topping headlines ever, even if it always looks as if they started having a product ready to use last month.
That's PIM, though. Not what Marvell is talking about.
Yes Marvell is talking, but not saying anything yet. So it's hard to compare. For me it's really just the fact that they are trying to drill into that HBM fortress, and they are doing it with the perspective of a company which designs AI ASICs for Google and others.
That's a lot like Nvidia starting on a new memory technology instead of trying to break into Windows laptops.
Just the announcement in itself is remarkable because of who is involved, not what they say.
And you have to remember that people could not buy Blackwells for nearly all of 2024, because they had all been allocated to OpenAI and a few others, who had already signed contracts with NVidia.
Yet the bottleneck on numbers wasn't TMSC or the NVidia chips, it was the HBM memory chips or the limited capacity for their assembly which restricted the market and drove the numbers.
So why was NVidia the only one making a killing if they weren't even producing the most precious resource?
That's what Marvell targets and it really only needs to be as good as HBM and more freely available to burst some bubbles. If it is better, too, the market might shift significantly.
I think a better move is to stack DRAM on more general-purpose compute dies, giving you PIM-like efficiency with almost the same level of generality we enjoy today.
However you break the memory wall without breaking the bank is good.