GPU Usage vs Clock Speed

Deus Gladiorum

Distinguished
I asked this question before and no one responded, so I'll ask again.

What's the difference between the two? When I play a game with MSI Afterburner, I have both on in the OSD but I'm not quite sure what the difference is. They're independent of one another clearly, as even though my max clock is 1254 MHz I can still have my GPU operating at 1137 MHz but only at 10% for example, even though 1137 is 90% of 1254. So what does the usage in GPU usage refer to? What is being "used"?
 
Solution
A CUDA core and a shader core are the same thing, they are the small processing unit in a GPU, AMD may have a different name for theirs, but any small core in a GPU is a shader core, nVidia just calls theirs CUDA cores because they can do more than just shade and they like to advertise CUDA as much as they can.


Each core of a CPU has a pipeline that it is shoving instructions through, occasionally there are hold ups that leave certain parts unused, or simply no instructions that want to use it. Basic components are registers, ALU, FPU, memory unit, and peripheral control. If you have a set of code that is
A=2
B=4
C=B/A
D=C+B
You cannot solve for D until you have found C obviously, but division is hard and usually takes ~10 cycles to...
Clock speed and usage are not connected. Every part of the chip runs at the same clock speed, in a GPU this is all your shader cores, in a CPU this is every stage of the pipeline in each core, the usage is determined by how much the chip is doing. If you only have the workload to keep 10% of your shader cores busy and the other 90% are sitting there doing nothing, they will still be running at full clock speed because the 10% in use need it to get their results out quickly.
 
Alright, so in a GPU I assume this refers to the number of CUDA cores and shader cores being used (assuming those are two different things; I'm not sure if they are, but if not would you mind explaining the difference) but what do you mean by "in a CPU this is every stage of the pipeline in each core"? For example, I started playing The Elder Scrolls: Morrowind recently and get the worst CPU bottleneck I've ever seen in a game. Even unmodded, I have lows where my frame rate get to around 65 fps. Of course I can't see beyond 60 fps unless I OC my monitor, but the fact that a game that old can even get that low blows my mind.

However, during these instances of 65 fps, I look at my CPU clock speed through CPU-Z and see that I'm still at 4.5 GHz without fluctuation, and in my task manager my CPU usage on one of my cores (I assume it's the one Morrowind is making use of since it's an old game that probably only uses one core) is only around 50% - 60% yet I know it's the cause for the bottleneck. So exactly what is going on in this pipeline that only translates to it using such a small amount?
 
A CUDA core and a shader core are the same thing, they are the small processing unit in a GPU, AMD may have a different name for theirs, but any small core in a GPU is a shader core, nVidia just calls theirs CUDA cores because they can do more than just shade and they like to advertise CUDA as much as they can.


Each core of a CPU has a pipeline that it is shoving instructions through, occasionally there are hold ups that leave certain parts unused, or simply no instructions that want to use it. Basic components are registers, ALU, FPU, memory unit, and peripheral control. If you have a set of code that is
A=2
B=4
C=B/A
D=C+B
You cannot solve for D until you have found C obviously, but division is hard and usually takes ~10 cycles to do compared to addition that takes only 1, so you have nothing in the adding portion of the ALU until the division has completed 10 cycles later, meanwhile the floating point math unit is doing nothing since you are only working with integers so the CPU usage percentage that is contributing is zero.



65 FPS isn't "low", and many open world games suffer from being open. The biggest burden on a CPU in the open world games is often determining what should be in your view. When you get these lows is your hard drive reading data? It may be going slower because it is waiting on data from the hard drive before it can do its thing. One setting that greatly reduces CPU and memory load is reducing the view/draw distance, this means it doesn't have to process nearly as much stuff.
 
Solution
shot > gpu usage is like cpu usage and core clock is like cpu clock
long >clock speed of the gpu (graphic unite processor) is which by the way the heart of the gpu and the speed of which the card handle operations and it's in GHz or mhz frequency which allow faster processing and finish the task more quickly so the faster the better but not in all cases because there is other things to be take in notice like memory type (GDDR3 or GDDR5) memory clock and memory bandwidth is very important element of the gpu and memory bus (higher is better) which connect the memory to the gpu
 


Ah, absolutely superb explanation! I hadn't considered things like the FPU, certain registers, and probably a cache or even two not being used. If I'm following this right, that also explains why a CPU overclock usually does comparatively little for a CPU bound game. If the CPU is bottlenecking in a game, let's say, due to the game only telling the CPU to utilize a single ALU to perform a few integer based operations, then sure increasing the clock speed might help but nowhere near as much as the game actually telling the CPU to utilize its second ALU. Am I right on that? Meanwhile, a well optimized game actually receives a bigger boost from the overclock because now the entire CPU pipeline works faster. For a GPU it's probably much easier to max its usage since I assume the CUDA cores and VRAM comprise most of the GPU, right? CUDA cores probably each receive the instruction to work on a certain portion of an image all at once, probably do to an easier time with an API and such I'd assume, and since VRAM is being used as a buffer it's undoubtedly going to fill up at high resolutions. So if my reasoning holds, it would make sense that GPUs are a lot easier to fill up then CPUs. It almost seems like CPUs are more complex than GPUs, or at least more difficult for programmers to utilize.

Also, I know 65 fps isn't low, but it's surprising because it's comparatively very low considering that my fps in this game is usually between 300 and 500 fps when I'm indoors, and anywhere from 65 to 120 fps when I'm outside. Since it's a game from 2002, I was incredibly surprised to take such a huge hit to my frame rate, even if I can't see that difference without looking at a frame counter. Also, it's very unlikely my HDD was the root of the problem. This occurs even at the start of the game, in the first cell block the game loads me into when I'm in the block's dead center. There's at least a couple more surrounding cell blocks loaded into RAM anyway, so my HDD can't possibly be accessed. Also in the first place I saw this happening I was standing still, and it occurred when I was just looking around. In one direction I had about 120 fps and when I looked towards the opposite direction without moving I dropped to 65. Additionally, Morrowind's draw distance is CRAP. If you've never played it before, here's its maximum vanilla draw distance which I was playing at:

Elder_Scrolls-Morrowind_%252528PC%252529_08.jpg


As you can see, that's quite low, but unfortunately I didn't get a chance to test out lowering this draw distance before I modded the crap out of it, though I'm sure you're correct, it is due to draw distance and terribly crappy optimization. I'm sure the reason the draw distance was so short was that they didn't how to optimize their coding at all and decided that even at that meager length even the best PCs at the time would be chugging along.

One last thing. You said division takes a while to calculate as compared to addition. I'm a freshman computer science major, so in future when I'm inevitably tested on program optimization, would you recommend that I keep division (and probably multiplication) to a minimum if possible?
 
Graphics cards were explicitly meant for parallel operations so they are more able to fill up all their cores, when you need to do the same set of operations on every polygon in a million polygon mesh, its really easy to split that across a thousand shader cores.

CPUs unfortunately were meant to be good at single threaded tasks, it is only recently that we have started having multiple cores that programmers need to learn to split their tasks across and it is rather difficult to code in such a way to have infinitely parallelizable code, and your performance gains will always be limited by the number of truely serial tasks that you cannot parallelize(See Amdahl's law)

As for your final bit, if you are working with a micro controller it is important to know what operations it can do quickly and which ones take a while, when you are coding for a computer the compiler will shuffle things about and change commands to make it work better. In some cases if you are dividing by a constant it will change it to multiplication with a decimal constant because FPU multiply is faster. In my example above every compiler would have seen that division by 2 and changed it to a bit shift right which divides by factors of 2 in just a single cycle. Don't worry about the intricacies of what your CPU is good and bad at for a few years, your compiler will take care of it for you until you start making massive programs.