News Nvidia Says Feature Similar to AMD's Smart Access Memory Tech is Coming to Ampere

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
You're right, I hadn't considered that. In this case, it only seems useful in cases you'd expect the system RAM to be saturated, as they can work in tandem.
SAM and equivalents aren't there to enable the CPU to use VRAM as extra memory, they are there to reduce or remove managerial overhead: with the CPU having flat address space access to the entire VRAM, it does not need to go through the extra overhead of mapping the correct VRAM region into host memory address space before every read/write to a different VRAM region, which saves 200+ns of latency on each operation where address mappings would otherwise need to be changed. It also means that threads don't have to compete for access to limited windows into VRAM address space either, which means fewer locks/mutexes between host CPU threads and GPU IO for greater concurrency and reduced overheads there too.

It makes CPU-GPU information exchanges more efficient.
 
  • Like
Reactions: TJ Hooker
Not one, three - the CPU firmware, the BIOS/UEFI and the VBIOS. Three sensitive pieces of firmware that need to work together along with the OS to work without blue screens.
In Nvidia's case, it is just one firmware update. Responding to a question from GN, Nvidia said their implementation will work fine with the 6 boards AMD says are compatible with SAM so long as AMD doesn't lock them out. AMD certainly isn't going to be helping Nvidia with this, so all Nvidia can do is update their own VBIOS to get this working on AMD platforms.
 
True I still want to see how the final products will perform. Once the hardware is out on AMD's side and NV has enabled it on their hardware.
According to AMD, SAM is typically worth 1-2%. So, Nvidia's implementation could do absolutely nothing, and they'd only be behind 1-2%. That wouldn't be very interesting from a benchmarking perspective, and I feel sorry for the reviewers who have to spend hours running tests just to isolate this difference.

Derp, never mind. The quote was about rage mode, not SAM.
 
Last edited:
Perhaps part of why AMD limited the BAR size adjustment to Ryzen 5000 platforms, at least initially, is that they want reviewers to test their new cards on the Ryzen platform. Most reviewers have been testing graphics cards on high-end Intel processors to reduce per-thread CPU limitations in games as much as possible, and some reuse results from prior reviews, making them less willing to change platforms, even if AMD's CPUs now offer higher gaming performance in many cases. So they likely figure that this (along with things like PCIe 4.0 support) gives reviewers more incentive to move to a Ryzen platform, at least for some of their testing. It's very possible that they may add support for the feature to other platforms post-launch, at a later time.

According to AMD, SAM is typically worth 1-2%. So, Nvidia's implementation could do absolutely nothing, and they'd only be behind 1-2%. That wouldn't be very interesting from a benchmarking perspective, and I feel sorry for the reviewers who have to spend hours running tests just to isolate this difference.
I'm not sure where you got 1-2% from, but at least according to AMD's marketing materials, Smart Access Memory can provide "up to 11% extra performance across select titles".

https://www.amd.com/en/technologies/smart-access-memory

Granted, that's "up to", meaning typical performance gains will be lower, but they are showing 5-6% gains at 4K ultra in at least Borderlands 3, Gears 5, Hitman 2, and Wolfenstein: Young Blood, along with that 11% in Forza Horizon 4. And sure, those are probably also above-average examples, but the fact that gains like those exist in at least some titles makes the feature definitely worth testing.
 
Last edited:
I'm not sure where you got 1-2% from, but at least according to AMD's marketing materials, Smart Access Memory can provide "up to 11% extra performance across select titles".

https://www.amd.com/en/technologies/smart-access-memory

Granted, that's "up to", meaning typical performance gains will be lower, but they are showing 5-6% gains at 4K ultra in at least Borderlands 3, Gears 5, Hitman 2, and Wolfenstein: Young Blood, along with that 11% in Forza Horizon 4. And sure, those are probably also above-average examples, but the fact that gains like those exist in at least some titles makes the feature definitely worth testing.

I made a mistake. The 1 to 2% was for rage mode, not SAM.
 
In Nvidia's case, it is just one firmware update. Responding to a question from GN, Nvidia said their implementation will work fine with the 6 boards AMD says are compatible with SAM so long as AMD doesn't lock them out. AMD certainly isn't going to be helping Nvidia with this, so all Nvidia can do is update their own VBIOS to get this working on AMD platforms.
And it will work on platforms where the CPU's firmware and the UEFI allow BAR address range adjustments in a bug-free manner, but right now it's limited to Ryzen 5000, 5xx-grade chipsets and probably some workstation-grade or server-grade Intel motherboards.
It is possible that older Ryzen platforms will get it in January when a newer AGESA revision drops on them; now that Nvidia have decided to enable the feature, I wouldn't be surprised if AMD backported the feature on all of its hardware.
 
And it will work on platforms where the CPU's firmware and the UEFI allow BAR address range adjustments in a bug-free manner, but right now it's limited to Ryzen 5000, 5xx-grade chipsets and probably some workstation-grade or server-grade Intel motherboards.
It is possible that older Ryzen platforms will get it in January when a newer AGESA revision drops on them; now that Nvidia have decided to enable the feature, I wouldn't be surprised if AMD backported the feature on all of its hardware.
Nvidia has said they are working with Intel to get it working on Z490 boards. Will be interesting to see if SAM works on Intel Z490 boards with AMD GPU's before it works with their own 3000 series CPU's.
 
...I wouldn't be surprised if this is as overhyped as hardware accelerated GPU scheduling...
This feature works best on weak machines where you free up resources that are desperately needed elsewhere.
when you have tons of them, difference is minimal.
overall weak system will gain AFTER support will be added.
I guess for now its just like RTX, support is not very good out of the box.
 
This feature works best on weak machines where you free up resources that are desperately needed elsewhere.
when you have tons of them, difference is minimal.
overall weak system will gain AFTER support will be added.
I guess for now its just like RTX, support is not very good out of the box.
Not true. On the contrary, it removes a bottleneck: with a fixed 256 Mb BAR, every time the software needs to change resources in VRAM, it has to go through the OS kernel to ask for a different BAR address, wait for it to happen, do it's operation, move the BAR again... Back when GPU only had 1Gb of VRAM you only needed to switch addresses once in a while to load resources, but nowadays with 8-16Gb frame buffers the system wastes hundreds of cycles every time resources have to be loaded in VRAM. Cycles during which the CPU and the GPU are stalled.
 
Not true. On the contrary, it removes a bottleneck: with a fixed 256 Mb BAR, every time the software needs to change resources in VRAM, it has to go through the OS kernel to ask for a different BAR address, wait for it to happen, do it's operation, move the BAR again... Back when GPU only had 1Gb of VRAM you only needed to switch addresses once in a while to load resources, but nowadays with 8-16Gb frame buffers the system wastes hundreds of cycles every time resources have to be loaded in VRAM. Cycles during which the CPU and the GPU are stalled.
you did not read whole thing. Hardware scheduler helps low end while this helps with GPU's that have tons of ram.
that comment was on hardware acc not bar limitation lifted.
 
This feature works best on weak machines where you free up resources that are desperately needed elsewhere.
when you have tons of them, difference is minimal.
overall weak system will gain AFTER support will be added.
I guess for now its just like RTX, support is not very good out of the box.
Yes and no - however you take it, allowing a GPU to manage its own schedule will always perform better than letting the CPU do it. But, it's true that it's rather impact-less when unused.
I would say it's a step further than hardware accelerated desktop compositing (who remembers back when the Windows desktop was essentially CPU-rendered?), which didn't bring much more performance but did make the display a lot more fluid to render, with less artifacts and less jerkiness in window movements.
 
  • Like
Reactions: Rdslw
Is CPU FW really its own, distinct thing? In practice it's always part of the BIOS/UEFI AFAIK.
No - the CPU comes with a bit of firmware, which can be updated through the BIOS, but can also be updated by the OS (Linux does it, Windows is trying to). It is also revision-specific.
Now of course with current CPUs are closer to SoC, and the majority of graphic cards will be connected to the CPU-driven PCI-e lanes, but the BIOS is still the one managing initial hardware resources allocations. One easy bug would be the BIOS allocating a set address range to some embedded hardware and bugging out if, for some reason, changing BAR range would somehow overlap with an unspecified hardware resource it used internally or whatever (normally, it should allow changes in address range by the OS, but I've seen cases of hardware accepting a resource modification and still expect the older resource to be used instead. Result : instant BSOD.
So yeah, all 3 pieces of firmware have to be in sync for this feature to work properly.
 
  • Like
Reactions: TJ Hooker
So yeah, all 3 pieces of firmware have to be in sync for this feature to work properly.
In principle, the BAR size is negotiable so everything should be written based on the idea that the requested BAR size may be smaller than the requested one and the only thing that needs to change is the part that sets the ceiling. The 256MB "limit" is a holdover from the 32bits days where memory-mapped IO could end up consuming a significant amount of the system's usable memory size, they likely could have changed it ~15 years ago when 64bits OSes were well on their way had they really wanted to.

The reason it only changed now probably has more to do with flattening the memory space for more efficient HPC applications than anything gaming-related.
 
  • Like
Reactions: mitch074
In principle, the BAR size is negotiable so everything should be written based on the idea that the requested BAR size may be smaller than the requested one and the only thing that needs to change is the part that sets the ceiling. The 256MB "limit" is a holdover from the 32bits days where memory-mapped IO could end up consuming a significant amount of the system's usable memory size, they likely could have changed it ~15 years ago when 64bits OSes were well on their way had they really wanted to.

The reason it only changed now probably has more to do with flattening the memory space for more efficient HPC applications than anything gaming-related.
More than likely. The fact is that BIOS/UEFI in the consumer space is often very buggy, and I wouldn't be surprised that a good number of motherboards do behave badly when asked to change the BAR address range outside of what they have been tested to do. Keep in mind that very often a BIOS is considered "stable" when it can install and boot Windows and that's it.
 
More than likely. The fact is that BIOS/UEFI in the consumer space is often very buggy, and I wouldn't be surprised that a good number of motherboards do behave badly when asked to change the BAR address range outside of what they have been tested to do.
The BIOS does not matter since drivers can resize and relocate BAR blocks wherever convenient once the OS takes control of the system, GPUs can just continue requesting 256MB by default for backward compatibility reasons, drivers can always do their secret BIG-BAR handshake later.
 
The BIOS does not matter since drivers can resize and relocate BAR blocks wherever convenient once the OS takes control of the system, GPUs can just continue requesting 256MB by default for backward compatibility reasons, drivers can always do their secret BIG-BAR handshake later.
They can, but it wouldn't be the first time a controller bugged out because it wouldn't actually accept to have its range modified.