Discussion CPU instruction set explanation thread

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
I mentioned in another thread that I suspect that future generations of cards are going to be less about adding more raw processing power, and more about improving the AI side of things - why add a few thousand more CUDA cores to scrape out an extra 5% performance when a few hundred Tensor cores can effectively double it? I know some people complain that they're not "real" pixels or frames and that it's a cheat, but pretty much every aspect of a modern rasteriser is a cheat already - parallax occlusion mapping, tessellation, screen space effects, they're all just as "fake" as DLSS!

Eh it depends. Using advanced pattern detection (that's all "AI" is) to interpolate additional rendering data is fine. Using that same technique to artificially advance a frame counter to market "performance" is definitely a cheat. My qualms has been marketing leaning on hype and the general ignorance of the public to sell products. Frame generation will never more then a gimmick because you are trying to render the future and that's not possible (with modern quantum physics), so you will always have weird artifacts and latency issues. Pattern detection upscaling on the other hand is an extremely useful tool, especially since display resolutions are going up much faster then graphics processing power. Doubling the screen resolution quadruples the processing requirements. If x is the required performance for 1920x1080 (2,073,600 pixels), you need 4x for 3840x2160 (8,294,400 pixels) and 16x for 7680x4320 (33,177,600 pixels). This means maintaining decent frame rates is going to become an absolute nightmare if not outright impossible without some sort of upscaling technology. An advanced pattern based upscaling algorithm isn't trying to guess the future, it's trying to guess what a 1080p rendered image would look like at 2160p, or a 2160p rendered image at 4320p.
 
  • Like
Reactions: Order 66
I would agree, I think it’s funny, but I really do hate vague answers, because it leads me to ask a million more questions.
Okay, but I mean if you really want to dig into how 3D graphics rendering works, the math is a fundamental part of it. Not only are there entire books on it, but every intro book on game development and 3D graphics APIs will tend to devote at least a chapter or appendix to reviewing it. Heck, a quick websearch just turned up this application note from FreeScale - a semiconductor company - so engineers trying to use its chips would have the refresher they need:

Keep in mind that people started doing computer graphics before specialized hardware existed for it. The first example of a ray-traced image was done on a VAX, like 45 years ago.
 
  • Like
Reactions: Order 66
Using advanced pattern detection (that's all "AI" is) to interpolate additional rendering data is fine. Using that same technique to artificially advance a frame counter to market "performance" is definitely a cheat.
I don't see how you can say spatial interpolation is okay but temporal interpolation is a "cheat". I agree with @NorbertPlays - the only thing that matters is the end result. In that regard, the frames only need to look good enough, sustain a high enough rate, and arrive at sufficiently low latency to qualify. Since 3D graphics is already a giant approximation, I don't buy into the idea of trying to draw a line in the sand at one form of interpolation and not others.

Now, someone might say the latency of frame gen is too high, when the framerate of the input stream drops too low, and I would consider that a legitimate complaint. IMO, the main problem of frame gen is that it works best when you need it the least (and vice versa)!

Frame generation will never more then a gimmick because you are trying to render the future and that's not possible
My understanding of DLSS3 and FRS3 is that they're temporal interpolation - not extrapolation. Hence, the latency penalty.

I can imagine future iterations of frame gen technology trying to feed an extrapolator with cheap-to-compute hints, so that it can do more accurate extrapolation and thereby avoid any latency penalty. Or, even if you still use frame gen for interpolated frames, if you have early hints, maybe you can start generating them before the subsequent frame is finished rendering.
 
We're still really in the baby stages of frame generation, but moving forward I can maybe see a hybrid approach being used - "important" parts (characters , enemies, etc) get rendered normally each frame, and motion estimation and optical flow are used for everything else, kind of like VRS but instead of lowering the spatial resolution of parts of a scene you lower the temporal resolution. As for visual artifacts, we've already seen a massive improvement in upscaling quality since the PS4 Pro checkerboard days and the pace of development of assorted image (re)generation techniques is astonishing!
 
I don't see how you can say spatial interpolation is okay but temporal interpolation is a "cheat". I agree with @NorbertPlays - the only thing that matters is the end result. In that regard, the frames only need to look good enough, sustain a high enough rate, and arrive at sufficiently low latency to qualify. Since 3D graphics is already a giant approximation, I don't buy into the idea of trying to draw a line in the sand at one form of interpolation and not others.

Now, someone might say the latency of frame gen is too high, when the framerate of the input stream drops too low, and I would consider that a legitimate complaint. IMO, the main problem of frame gen is that it works best when you need it the least (and vice versa)!


My understanding of DLSS3 and FRS3 is that they're temporal interpolation - not extrapolation. Hence, the latency penalty.

I can imagine future iterations of frame gen technology trying to feed an extrapolator with cheap-to-compute hints, so that it can do more accurate extrapolation and thereby avoid any latency penalty. Or, even if you still use frame gen for interpolated frames, if you have early hints, maybe you can start generating them before the subsequent frame is finished rendering.
I feel like the latency penalty for frame gen isn’t that bad especially if you’re playing games with a controller.
 
The first example of a ray-traced image was done on a VAX, like 45 years ago
My mind is still boggled by the CGI in TRON - not only did they not have specialised hardware for 3D, they didn't even have a graphical display - the entire thing was done by laying things out on graph paper, entering a bunch of coordinates into some custom written software as raw numbers, and hoping it looked like what they wanted when it was eventually rendered incredibly slowly directly onto a negative!
 
My mind is still boggled by the CGI in TRON - not only did they not have specialised hardware for 3D, they didn't even have a graphical display - the entire thing was done by laying things out on graph paper, entering a bunch of coordinates into some custom written software as raw numbers, and hoping it looked like what they wanted when it was eventually rendered incredibly slowly directly onto a negative!
What?! I never knew that.
 
My mind is still boggled by the CGI in TRON
Yes, agree 100%.

Another interesting factoid about Tron is that it did poorly at the box office. It was probably the first example in history of a film where cutting-edge CGI couldn't compensate for its other flaws.

I remember being somewhat in awe of its visual effects, before I had the slightest clue how they were made. I had never seen anything remotely like it.

- not only did they not have specialised hardware for 3D, they didn't even have a graphical display
I'm not sure I heard about the lack of a display. Any idea how the generated the film prints?

the entire thing was done by laying things out on graph paper, entering a bunch of coordinates into some custom written software as raw numbers, and hoping it looked like what they wanted
I actually did something similar, the first time I ever used PoV-Ray. I drew out the scene on graph paper and entered in the geometry into the text files it used as input. I'd usually draw the scenes during the day, at school, then make the text files when I got home and let the renders run overnight.

One of the first things I tried was to put a light source in front of the camera, as I was really curious to know what they looked like! Imagine my surprised when I saw nothing!
: D
 
  • Like
Reactions: Order 66
So view CPU's are having 6~14 core that are good at single lots of scalar instructions. GPU's as having a thousand cores good at doing massive amounts of vector instructions.
You're being inconsistent about the notion of what constitutes a GPU "core". If you take Nvidia's view, what they talk about as a "core" is each SIMD lane (i.e. scalar processor). As I previously said, if you use this definition, then each Golden Cove or Zen 4 CPU core would be equivalent to about 48 of Nvidia's "cores".

Using a more classical CPU definition of a core, a GPU like the RTX 4090 only has about 512 blocks that are comparable to a CPU core. That's because Nvidia uses a construct called a Streaming Multiprocessor (SM), each of which contains 4 partitions, and the RTX 4090 has 128 of these. Each partition has the full contingent of execution units and logic needed to execute independently of the others:

nvidia-ada-lovelace-gpu-architecture-streaming-multiprocessor.png


Source: https://www.nvidia.com/en-us/geforce/news/rtx-40-series-vram-video-memory-explained/

512 is still an awful lot of cores, but these are much simpler, in-order cores designed not only to be area-efficient but also energy-efficient. That's the only way they can pack so many onto a single die and find enough power to crank them & their SIMD units all up to 2.2 GHz.
 
  • Like
Reactions: Order 66
A game like Quake managed to run perfectly well on a CPU that's several thousand times slower than anything from the last few years!
Quake needed a 486-66DX2 to be remotely playable, but really wanted a Pentium. So, a 66 MHz CPU, with a single pipeline vs. modern CPUs like an i3-12100 that boost to 4.3 GHz and has a 6-way decoder (but real IPC is a bit lower). If we take average IPC of about 4 and assume the IPC of the i486 is about 0.333, that gives you another 12x speedup. So, before we consider the increased core count or any SIMD extensions, we're at a performance ratio of about 782:1.

So, the only way I think you get to "several thousand" is by factoring in multi-core and SIMD extensions. Also, while my IPC figure for the i486 might've been low for integer performance, it was probably too high for basic FPU instructions.

BTW, Descent was the first game I saw which had perspective-correct texture mapping and was playable on a 486 (at 320x200 resolution). It ran a bit faster than Quake, but then it had simpler models and lighting.

Descent_%281995_video_game%29.png


One thing that made Quake so neat is it had pre-baked ambient lighting, which they computed using radiosity, on a big workstation. That also meant that the GPU-accelerated version required GPUs & API implementations capable of multi-texturing. Quake also used Z-buffering for character rendering, which I'm pretty sure Descent did not.
 
  • Like
Reactions: Order 66
Quake needed a 486-66DX2 to be remotely playable, but really wanted a Pentium. So, a 66 MHz CPU, with a single pipeline vs. modern CPUs like an i3-12100 that boost to 4.3 GHz and has a 6-way decoder (but real IPC is a bit lower). If we take average IPC of about 4 and assume the IPC of the i486 is about 0.333, that gives you another 12x speedup. So, before we consider the increased core count or any SIMD extensions, we're at a performance ratio of about 782:1.

So, the only way I think you get to "several thousand" is by factoring in multi-core and SIMD extensions. Also, while my IPC figure for the i486 might've been low for integer performance, it was probably too high for basic FPU instructions.
FPU, floating point unit?
 
  • Like
Reactions: bit_user
FPU, floating point unit?
Floating-point arithmetic is a computer number format that's like scientific notation. For instance, how you might write 4.396 x 10^-3 instead of 0.004396.

The benefits of floating-point over integers are that it has a greater range and can represent non-integral values. The downside is that it's inexact and has less precision, except around zero. It also has weird properties like addition and multiplication not being associative!

Because it's more complex, it requires more circuitry to execute. That means it has a larger silicon footprint and uses more energy. In fact, these scale roughly as a square of the mantissa, in modern implementations. This helps explain why AI-optimized processors tend to prefer low-precision floating-point number formats.

Sometimes really weird maths!
LOL, I've been there & done that!

About 20 years ago, I wrote lots of routines where I bit-hacked the IEEE754 fp32 format. These days, I honestly wonder how it compares with optimized CPU implementations.

BTW, I think GPUs tend to implement transcendental operations in a way that allows you to tradeoff execution time vs. accuracy. IIRC, AMD's 3DNow had some instructions like that.
 
Last edited:
  • Like
Reactions: Order 66
RT cores do one thing, and they do it fast: check if a line intersects a box or triangle. It takes a couple of dozen floating point calculations to do "normally", but because RT cores don't (and can't) do anything else they can be engineered to do it really really quickly. Without the RT cores the GPU has to perform all those calculations one at a time which a) is slower, and b) takes resources that could be used for something else.
RT cores do something else: BVH traversal. BVH is short for Bounding Volume Hierarchy. This involves testing lots of bounding volumes (boxes or spheres, like you said) and then conditionally fetching & testing the next set. This probably involves enough conditional control-flow that it's not easy to do efficiently via SIMD.

534px-Example_of_bounding_volume_hierarchy.svg.png


The point of BVH is to drastically cut down on the number of intersection tests you need to do, in order to determine which object a given ray intersects first.

Another thing I believe newer GPU hardware does is to accelerate building or modifying a BVH. Because, if a scene is dynamic, then the BVH needs to be updated or rebuilt, as well. And most games don't occur in a static world!
 
  • Like
Reactions: Order 66
Remind me what associative means in this context? I really should know this considering I’m still in school, but it’s been a few years since I’ve learned it.
The associative property is what you learned in algebra that holds:

(x + y) + z = x + (y + z)

In other words, the order of operations doesn't affect the result. Algebra depends on that.

I’m so confused by this.
The mantissa is the "4.396" part, in the number 4.396 x 10^-3. The point is that the more precision you have in that part of a floating-point number, the larger the circuitry gets. It doesn't just increase linearly, but quadradically.

It's basically a way of saying that higher-precision arithmetic is disproportionately more expensive. You might otherwise think a 64-bit number should be twice as expensive to multiply as 32-bit, but it actually takes about four times as much.
 
  • Like
Reactions: Order 66
So, the only way I think you get to "several thousand" is by factoring in multi-core and SIMD extensions.
Given the focus of this thread on multicore workloads, multicore performance was definitely included in my accounting!

It's fun to learn stuff and gratifying to actually build something of your own!

I think it was all a lot simpler to learn when I started back in the 80s; once I'd figured out how to make a cube spin I had basically covered everything there was to know - how to move points in 3D space, and how to convert 3D space to screen space - and everything else was just an extension or refinement. These days everything is hidden behind API layers and gets a lot more involved, and all the tutorials I can find seem to gloss over the basic "how 3D actually works" and jump straight into OpenGL or DirectX which are a lot to take in if you're a complete noob...
 
  • Like
Reactions: Order 66
This means maintaining decent frame rates is going to become an absolute nightmare if not outright impossible without some sort of upscaling technology. An advanced pattern based upscaling algorithm isn't trying to guess the future, it's trying to guess what a 1080p rendered image would look like at 2160p, or a 2160p rendered image at 4320p.
The fault in that is that 4k already is too much resolution for a comfortable desktop experience so going above that will not be met with a lot of acceptance, at least that's what I think, I have a hard enough time looking even at 1080p without reading glasses.
1440 is the sweet spot I believe for most people and GPUs are already capable of pushing enough pixels at that resolution without AI, sure it's a good thing for cheaper cards and people that want to save money but for the high end it's not going to add anything.

Now, couch gaming is a different thing but that's way more console territory than it is PC.
 
  • Like
Reactions: Order 66
BTW, @Order 66 if you're that interested in this stuff, maybe consider trying to find a good "Intro to 3D Game Programming" course, video series, or (gasp!) book.
Thanks, I am fascinated by anything and everything with technology, which is part of the reason why I have started so many discussion threads.
 
@bit_user, I've kinda always been in awe of the amount of knowledge that you have. How long did it take you to learn all of this? I've been into computers (mainly strictly hardware) for about 5 years, and while I've learned a lot, I realize just how much I still have to learn.
 
BTW, @Order 66 if you're that interested in this stuff, maybe consider trying to find a good "Intro to 3D Game Programming" course, video series, or (gasp!) book.

It's fun to learn stuff and gratifying to actually build something of your own!
I've thought about it, and actually tried to build simple games (mainly in Roblox), but my problem is that I get frustrated when I can't figure out how to do something, and when I try researching the problem, if I don't find anything, I kinda just give up. I would love to do it, but I have issues with staying with it. I realize that it's a personal problem, but I still get very frustrated with it. Also, I realize that you're talking about something different, but I just thought I would highlight my limited experience with game development so far.
 
The fault in that is that 4k already is too much resolution for a comfortable desktop experience so going above that will not be met with a lot of acceptance, at least that's what I think, I have a hard enough time looking even at 1080p without reading glasses.
1440 is the sweet spot I believe for most people and GPUs are already capable of pushing enough pixels at that resolution without AI, sure it's a good thing for cheaper cards and people that want to save money but for the high end it's not going to add anything.

Now, couch gaming is a different thing but that's way more console territory than it is PC.
I realize that 4k is too much for most users, but I have terrible eyesight, so as a result, I have to sit closer to the screen or zoom things in, but the problem I have is that seeing pixels drives me nuts. So I kinda need higher resolutions than most, and I'm currently using a 1080p 22in monitor, the pixel density is decent, but it's only 1080p and 22 inches.