News GPUs can now use PCIe-attached memory of SSDs to boost VRAM capacity —Panmnesia's CXL IP claims double-digit nanosecond latency

The article said:
Although CXL is a protocol that formally works on top of a PCIe link, thus enabling users to connect more memory to a system via the PCIe bus, the technology has to be recognized by an ASIC and its subsystem
This is confusing and wrong.

CXL and PCIe share the same PHY specification. Where they diverge is at the protocol layer. CXL is not simply a layer atop PCIe. The slot might be the same, but you have to configure the CPU to treat it as a CXL slot instead of a PCIe slot. That obviously requires the CPU to have CXL support, which doesn't exist in consumer CPUs. Not sure if the current Xeon W or Threadrippers support it, actually, but they could.
 
I don't see the article talking about bandwidth. Does it not matter for the expected AI workload?
(I assume no one would use this to game on, except youtubers)
 
  • Like
Reactions: artk2219
while this would help Nvidia's stinginess on vram (especially lower end) factoring in the cost it would likely be cheaper just gettign next sku higher that comes w/ more vram...
This is definitely aimed at very expensive pro models with maxed out VRAM already. The cost of another expansion card would definitely not be cost effective versus the next sku with more vram.
 
  • Like
Reactions: KyaraM and artk2219
I don't see the article talking about bandwidth.
Because the PHY spec is the same as PCIe, the bandwidth calculations should be roughly the same. CXL 1.x and 2.x are both based on the PCIe 5.0 PHY, meaning ~4 GB/s per lane (per direction). So, a x4 memory expansion would have an upper limit of ~16 GB/s in each direction.

Does it not matter for the expected AI workload?
Depends on which part. If you look as the dependence of high-end AI training GPUs on HBM, bandwidth is obviously an issue. That's not to say that you need uniformly fast access, globally. There are techniques for processing chunks of data which might be applicable for offloading some of it to a slower memory, such as the way Nvidia uses the Grace-attached LPDDR5 memory in their Grace-Hopper configuration. You could also just use the memory expansion for holding training data, which is far lower bandwidth than access to the weights.

(I assume no one would use this to game on, except youtubers)
Consumer GPUs (and by this I mean anything with a display connector on it - even the workstation branded stuff) don't support CXL, so it's not even an option. Even if it were, you'd still be better off just using system memory. Where this sort of memory expansion starts to make sense is at scale.
 
This is definitely aimed at very expensive pro models with maxed out VRAM already. The cost of another expansion card would definitely not be cost effective versus the next sku with more vram.
We don't know if they will actually have more VRAM, and even so, knowing Nvidia, they will add something like 2GB, which is quite @Nal.
Nevertheless, if the GPU can only access the factory designated amount due to bandwith limitation, then we are cooked anyway.
One way or another, it's a nice approach to remediate the lack, but still very theoretical and subjected to many dubious factors.
 
  • Like
Reactions: artk2219
I wish video card makers would just add a second pool of RAM that would use SODIMM modules. For example imagine if a card like the RTX 4080 had 2 SODIMM slots on the back of the card for extra RAM. While it would be a slower pool of RAM of around 80-90GB/s compared to the 736GB.s of the VRAM, it would still be useful.

Video card makers already have experience with using and prioritizing 2 separate memory pools on the same card, for example the GTX 970 would have a 3.5GB pool at 225-256GB/s, and a second 512MB pool at around 25-27GB/s depending on clock speed. If a game used that 512MB pool, the performance hit was not much, as the card and drivers at least knew enough to not shove throughput intensive data/ workloads into that second pool.

If they could do the same but with 2 DDR5 SODIMM slots, then users could do things like have a second pool of up to about 96GB, and it would have far fewer performance hits than using shared system memory which tops out at a real world throughput of around 24-25GB/s on a PCIe 4.0 X16 connection that also has to share bandwidth with other GPU tasks, thus not well suited for pulling double duty.
 
Last edited:
BTW, I predict that once mainstream desktop CPUs switch to using on-package memory, these sorts of CXL.mem expansion modules will be the only way to add RAM to your PC (other than a CPU swap). 2026 is probably the soonest it could happen.

They could even go in M.2 slots. Funny enough, a 2280 M.2 board is a similar size & shape as a DIMM. They just have the connectors on a different edge + require a controller IC.
 
Come on Anton, this title is clickbait, because you seem to suggest that this could somehow help current owners of VRAM starved GPUs. The subtitle should dispell that but by then people have jumped into the article.

Let's not forget that consumer GPUs like the RTX 4090 already struggle to get more than 20% compute capacity out of their 1TB/s VRAM for LLMs, because they need a full sequential weights pass for every token they generate. That's why HBM is so much less painful at roughly 4TB/s.

PCIe v4 is 32GB/s, v5 twice that, but that is as if you were trying to extend SSD capacities with floppies or tape.

Feed enough tape drives in parallel and you can achieve any bandwidth for sequential feeds, so yeah, you can design your workloads completely around something like that, if the scale of your problem overcomes all other considerations and you can build your own hardware including chips.

But that doesn't include your readership.

Articles like that are wonderful, add color and expand our horizon. But they should be marked as "technology outfield" or similar, so they don't just create unrealistic expectations and then disappointment.
 
Last edited:
Seems like Optain would have been ideal for this use case. Like an L4 for GPU usage.
I think you mean Optane, and I disagree. Three main reasons:
  1. I'm seeing new DDR4 DIMMs for $56, sold by Newegg. That works out to 0.571 GB/$. A 1.6 TB Optane P5800X is running about $2756, which breaks down to a cost of 0.581 GB/$. So, no major cost advantage.
  2. DRAM is about an order of magnitude faster and lower-latency than even Optane DIMMs.
  3. DRAM has far higher endurance than Optane.

So, when what you really want is DRAM, what you really need is DRAM. Optane makes a bad substitute. It's only better when you actually need persistent storage.
 
  • Like
Reactions: thestryker
CXL and PCIe share the same PHY specification. Where they diverge is at the protocol layer. CXL is not simply a layer atop PCIe. The slot might be the same, but you have to configure the CPU to treat it as a CXL slot instead of a PCIe slot. That obviously requires the CPU to have CXL support, which doesn't exist in consumer CPUs.
Perhaps their chip allows the CPU to see the card as both CXL and PCIe (on different bus addresses)?
Not sure if the current Xeon W or Threadrippers support it, actually, but they could.
I have it in BIOS as an option. No way to test it though.
 
Perhaps their chip allows the CPU to see the card as both CXL and PCIe (on different bus addresses)?
Everything I know about these standards would suggest that even if a device supports both protocols, the choice would be mutually exclusive.

I have it in BIOS as an option. No way to test it though.
Neat! You're relatively future-proof, I'd say. I doubt CXL 3.0 will even start to roll out for servers until next year, at the very earliest. We might start to see consumer products with CXL in 2026 or later (last year, AMD confirmed they're working with partners on such a thing).
 
Not sure if the current Xeon W or Threadrippers support it, actually, but they could.
Article cites it being CXL 3.1 compliant so I'm guessing not any time soon. CXL 3.0 spec came ~14mo before 3.1 so I'm guessing it would be some time before we see any retail hardware that supports it.
BTW, I predict that once mainstream desktop CPUs switch to using on-package memory, these sorts of CXL.mem expansion modules will be the only way to add RAM to your PC (other than a CPU swap). 2026 is probably the soonest it could happen.
I wouldn't mind this so long as everything going into the system supported the specification. Everything being able to pull from that pool should be close to the revolution that HDD to SSD was responsiveness wise.

Since the early CXL.mem cards have supported DIMMs I could even see the possibility of LPCAMM if the added bandwidth was a 50%+ gap like we're seeing now.
 
Persistent storage cannot be used as mem, period. Think of all the flush-write cycles that involve, ssd will breakdown in days.
 
Persistent storage cannot be used as mem, period. Think of all the flush-write cycles that involve, ssd will breakdown in days.
Optane SSDs have endurance on the order of 100 drive writes per day, for 5 years. That's enough endurance that you can sort of use them as DRAM, but not blindly.

They're also bit-addressable, meaning there's potentially zero write-amplification (in practice, I'd expect the memory region to be configured as write-back cacheable, so you have write-amplification only up to multiples of 64 bytes).

I'm not really trying to defend the practice, so much as just to explain why it's not so crazy that Intel made Optane-based PMem DIMMs.
 
  • Like
Reactions: thestryker
Or just put (the word) next time, that would be better.
FWIW, the site does have a policy against foul language and I can't advise on exactly where moderators will draw the line.

I was merely providing advice on the mechanics of the forum software, in that if you don't want a word starting with @ to be interpreted as a user-reference, you can surround it with [icode] tags to avoid that effect. This goes for pretty much anything else, including tags.
 
I wish video card makers would just add a second pool of RAM that would use SODIMM modules. For example imagine if a card like the RTX 4080 had 2 SODIMM slots on the back of the card for extra RAM. While it would be a slower pool of RAM of around 80-90GB/s compared to the 736GB.s of the VRAM, it would still be useful.

Video card makers already have experience with using and prioritizing 2 separate memory pools on the same card, for example the GTX 970 would have a 3.5GB pool at 225-256GB/s, and a second 512MB pool at around 25-27GB/s depending on clock speed. If a game used that 512MB pool, the performance hit was not much, as the card and drivers at least knew enough to not shove throughput intensive data/ workloads into that second pool.

If they could do the same but with 2 DDR5 SODIMM slots, then users could do things like have a second pool of up to about 96GB, and it would have far fewer performance hits than using shared system memory which tops out at a real world throughput of around 24-25GB/s on a PCIe 4.0 X16 connection that also has to share bandwidth with other GPU tasks, thus not well suited for pulling double duty.
I really wish video card manufacturers would do this, but they would be apprehensive to do anything that might discourage sales of top end video cards, so maybe only RTX 4090 and whatever the top model workstation and server products are?
 
I wish video card makers would just add a second pool of RAM that would use SODIMM modules. For example imagine if a card like the RTX 4080 had 2 SODIMM slots on the back of the card for extra RAM. While it would be a slower pool of RAM of around 80-90GB/s compared to the 736GB.s of the VRAM, it would still be useful.

Video card makers already have experience with using and prioritizing 2 separate memory pools on the same card, for example the GTX 970 would have a 3.5GB pool at 225-256GB/s, and a second 512MB pool at around 25-27GB/s depending on clock speed. If a game used that 512MB pool, the performance hit was not much, as the card and drivers at least knew enough to not shove throughput intensive data/ workloads into that second pool.

If they could do the same but with 2 DDR5 SODIMM slots, then users could do things like have a second pool of up to about 96GB, and it would have far fewer performance hits than using shared system memory which tops out at a real world throughput of around 24-25GB/s on a PCIe 4.0 X16 connection that also has to share bandwidth with other GPU tasks, thus not well suited for pulling double duty.
That is very close to the point where those GPUs could just access CPU or CXL RAM, which they typically can already in newer CUDA variants. That would be too much effort of too little gain.

I'd just be happy if you could buy additional VRAM at linear pricing but GPU vendors see that as a chance to segment the market between hobby and professional use and change accordingly.

And I guess GPU vendors are under contractual obligations not to sell high VRAM capacity consumer GPUs, so we can only dream on.
 
That is very close to the point where those GPUs could just access CPU or CXL RAM, which they typically can already in newer CUDA variants. That would be too much effort of too little gain.

I'd just be happy if you could buy additional VRAM at linear pricing but GPU vendors see that as a chance to segment the market between hobby and professional use and change accordingly.

And I guess GPU vendors are under contractual obligations not to sell high VRAM capacity consumer GPUs, so we can only dream on.
Yah, the industry really doesnt like it when server customers opt to use cheaper consumer grade tech instead of the pricier stuff.