FP32/64 Cores in GPUs, what are they?

Boom_4

Prominent
Jun 24, 2017
3
0
510
I know that obviously FP cores are meant for floating point calculations and I know that they consist of a FPU, registers, etc, etc, but I don't know something about them.

So a simple question: What part are FP32/64 cores in/of GPUs? Are they part of the shader cores? Are they separate and why are they clocked at core clock speeds, while shader cores(cuda cores/stream processors) are clocked differently at a ratio to the core clock speed.
 
Solution
Guess you got confused with terminology. FP32/64 cores are single-precision (FP32) shader processors and double-precision (FP64) shader processors.

nVidia calls their shader processors CUDA cores. Do you see it now? FP32/64 cores=CUDA cores=shader processors. So it is CUDA that does floating point calculations (nVidia). "Stream Processor" is an AMD-invented term for the same thing - shader processors.

CUDA or StreamProcessing is a concept that (via software) turns the massive computational power of a modern graphics accelerator's shader pipeline into general-purpose computing power, as opposed to being hard wired solely to do graphical operations.

What is a Shader Processing Unit? Also known as Stream Processor (AMD), CUDA Core (NVIDIA), the Shader Processing Unit became the most important component on a Graphics Card, upon the release of Shader Unified Architectures, back in 2006.

Before the release of Unified Shaders, GPUs had 2 types of Shaders: Pixel Shaders and Vertex Shaders, whereas Pixel Shaders were in charge of computing color and Vertex Shaders allowed the control of movement, lightning, position, and color in any scene involving 3D models.

Both are important and GPUs always had a major problem: an unbalance between Pixel and Vertex Shaders. Example: Radeon X1900 - 36 Pixel Shaders and 8 Vertex Shaders. The problem? Depending on the need of 3D application, often Pixel or Vertex would be idling. Pretty much like a computer waiting for its printer to finish a print order. Another term would be: bottleneck.

Well, unified Shaders will switch between the two depending upon what work needs to be done. In other words, you don't have one or the other, ideally, sitting idle waiting for the other group to finish their work and thus allows higher efficiency.

FP32/64=CUDA=SHADER UNITS=STREAM PROCESSORS are like a small processors of a video card that's why they can have their own clocks.

There are Shader Units, TMUs (Texture mapping unit), ROPs (Render output unit), Compute Units

AGAIN: floating point is calculated by Shader Units=Shader Processors=CUDA=STREAM PROCESSORS
 


As I thought, but all the marketing none-sense...
Sorry, but I'm obsessively curious. :)
Next question, how come CUDA cores and Stream processors have the same compute(FLOPs) performance per core, per clock, but they have different gaming/shader performance. I know that shaders are clocked differently from the entire core in a certain ratio, Nvidia before allowed for the shader clock speeds to be manually adjust, without changing the core clock speeds(clock speeds of TMUs, ROPs, etc, etc).
And since the TMUs, ROPs and pretty much everything is pretty much the same, except for cache sizes, OoO, schedulers and such, that shouldn't be the case, otherwise the compute performance would be different.

This question is the reason I asked the previous one. :)
 


I don't think that they do. Different architecture. And different architecture and software programming in every other aspect of graphics card components and how they work toether. That is too deep for me. Thats why usually AMD cards have more GFLOPS but they may have lower FPS in games for example. Titan X does 2500 GFLOPS less than ADM's Fury X yet Titan is better in most games.
 
Solution