Remember how bad Cyberpunk 2077 was when it released. Seems like all games get rushed out.
View: https://youtu.be/8aB6J5xI6qo
Seems like nVidia GPUs dont get more FPS if you change resolution.
A quick benchmark on my gaming PC with a 3080 FE and 3900X of the new Star Wars: Jedi Survivor game at 4K with FSR 2 and raytracing enabled. The game is poorly optimised and due to a big CPU bottleneck there is no improvement to my framerate by lowering the settings beyond those shown in the video (my optimised settings).
Settings recommendations are therefore impossible, really, although you can disable ray tracing to claw back some performance. Basically, Star Wars Jedi: Survivor is not utilising the hardware presented to it at all in a meaningful way.
In short, Star Wars Jedi: Survivor is essentially ignoring the fact that CPUs have entered the many-core era. With higher settings it is even more disastrous – with ray tracing active, more smaller cores are tasked with maintaining RT's BVH structures, but ultimately, performance drops still further to the point where I've observed CPU-limited scenes on a 12900K that just about exceed 30fps. On a mid-range CPU like the Ryzen 5 3600, for example, it is even more catastrophic.
What do gamers think. Well...
Ruined by AMD™
Tiers of RT hardware.
Tier 1 RT is basically using shaders/compute untis to do Software RT.
Tier 2 is all about adding dedicated hardware support for Box and triangle intersection acceleration. I believe Ray Tracing in RTX cards is definitely at tier 2.
Tier 3 is all about BVH processing and the memory management. Which we all know as the memory bottleneck in recent RT games. This is the next step in RT hardware development and could prove to be critical in deciding which hardware ( AMD, Nvidia or Intel ) has better RT performance.
Tier 4 is all about making the use of RT hardware more efficient by grouping the Rays.
Tier 5 is all about adding Coherence and BVH generation of a highly dynamic/moving scene on fly on the dedicated hardware block.
Nvidia's RT cores do all the BVH traversal so they are
at least tier 3.
And
do note that Turing had two distinct hardware acceleration features from the RT cores, the first being a Ray-Triangle accelerator (Tier 2) and a BVH Traversal Unit (Tier 3). By this Tiered system, Turing cards are Tier 3.
Each of these following components are part of the process of ray-tracing;
BVH creation and updating
BVH traversal
Ray-Gen
Ray/Triangle Intersection
Shading
Denoising
And so if I were to take the current RTX cards and assess their RT capability compared to a GTX card, I would do it like this;
RT Feature | RTX GPU | GTX GPU |
---|
BVH Creation | No | No |
BVH Traversal | Yes | No |
Ray-Gen | Yes | No |
Ray Batching | Yes (Partial) | No |
Ray/Triangle Intersection | Yes | No |
Shading | Yes (Legacy GPU Shading techniques) | Yes |
Denoising | No (Potential to use Tensor cores) | No |
AMD.
US20190197761 is the original AMD patent everyone was discussing back in June 2019. It's description states the CUs are responsible for traversing BVH.
US20200193685 is the new patent.
And the implementation of raytracing that it describe pretty much matches Nvidia's implementation.
[0023] The ray tracing pipeline operaties in the following manner. A ray generation shader is executed. The ray generation data sets up the data for a ray to test against and request the ray intersection unit test the ray for intersection with triangles.
[0024] The ray intersection test unit traverses an acceleration structure...
...For triangles which are hit, the ray tracing pipeline triggers an execution of an any hit shader. Note that multiple triangles can be hit by a single ray.
AMD's RDNA 2 TMUs has both BVH & intersection fixed function units.
The only thing handled by the SIMD ALUs is passing the work to the TMU to process. LLVM commit shows how it's done, you call the RT function by using a regular texture shader with the data for the BVH start.
The TMUs also have their own cache to handle that, reducing memory access that would cause massive perf hits.
What it can't do is regular texture mapping and RT at the same time.
Nvidia already explained their RT cores in their architecture whitepapers where they describe how they do the entire BVH traversal in one go.
AMD the initiation of TMUs to do BVH, is through a regular texture shader. So you schedule that texture shader (with BVH data) and preferably along with other graphics shaders, the ALUs just pass on the BVH shader to the TMUs.
The patent also says shader units (aka SPs, cores, whatever you want to call it) decide what to do with the intersection results and how the next traverse it. The pdf is images instead of embedded text or I'd quote it, but it's spelled out in section 0047.
Once ray - triangle is confirm hit, the return data goes to the shared cache within the CU, the ALUs then proceeds on the next step, if there is a need for secondary or tertiary bounces, it will initiate the TMU BVH + intersection engine again.
The ALUs don't do any of the BVH traversal or intersection testing (SIMD ALUs suck at doing that, as we know), that's within the modified TMUs new RT fixed function unit.
One of the XSX engineers described the RT hardware on the 52CU RDNA2 GPU as being capable of >13TF for raytracing, and working in tandem with the shaders >25TF. Turing RT cores are capable of 34 "RT-FLOPS" and the 3080 58 "RT-FLOPS".
Lisa Su said at the launch of RDNA1 that they would not want Ray Tracing on their cards until the performance hit was mitigated enough to make it viable.
AMD only had a 256 bit memory bus and 128MB of cache to help alleviate the memory bottleneck.
If someone tells you AMD and nVidia are different for RT. That nVidia RT titles harm AMD performance. Note
US20200193685 which implies AMD use the same implementation more or less as nVidia. Note the issue is hardware performance for AMD GPUs. nVidia should always be ahead in RT because their hardware is more capable.
Over on YouTube, PureDark (opens in new tab) has uploaded a video showcasing a modded version of the game running at much better framerates. They've implemented a DLSS Frame Generation mod, which according to the video evidence brought their game up from 45 to 90 actual fps. That's a marked improvement over what many are seeing, especially Steam's ticked-off comment section.
"I had a breakthrough and [am] now trying to replace FSR2 with DLSS, that would make the image look much better."
So DLSS 3 and 2 like mod on the way. What AMD paid to keep out.