How To Calculate Theoretical GPU FLOPS?

Icaraeus · Oct 4, 2014

How do I measure the theoretical performance of my Sapphire R9 270X? The on-paper performance of the standard 270X is 2.69TF but I don't know how AMD ended up with those numbers. I've overclocked mine the most it could go so just wondering what it could theoretically achieve for the fun of it.

Default GPU Clock - 1070mhz
Modified GPU Clock - 1190mhz (+120mhz)
Default VRAM Clock - 1400mhz
Modified GPU Clock - 1510mhz (+110mhz)

Before overclock

Pixel Fillrate - 34.2 GPixel/s
Texture Fillrate - 85.6 GTexel/s
Bandwidth - 179.2 GB/s

After overclock:

Pixel Fillrate - 38.1GPixel/s (+3.9GPixel/s)
Texture Fillrate - 95.2GTexel/s (+9.6GTexel/s)
Bandwidth - 193.3 GB/s (+14.1GB/s)

Maximum OC temp on stress-test: 70 degrees
Maximum voltage: 1.295V

chris987 · Oct 5, 2014

Texture Units * Raster Operators * (core clock) = GFLOPS

core clock = 1ghz = 1000mhz

80 * 32 * 1 = 2560 GFLOPS or 2.56 TFLOPS

Icaraeus · Oct 5, 2014

Okay so considering my 270X at the moment has:

TMUs - 80
ROPS (Raster Operations) - 32
Core Clock - 1180
Mem Clock - 1570
VDDC - 1.257V

80x32x1180 = 3020800 = 3.02 TeraFlops if I measured correctly.

Does the memory clock have any effect on the theoretical performance?

chris987 · Oct 6, 2014

it does affect performance. the faster the vram the faster the data (TFLOPS) can be brought for processing! thats why gpu with norrow bandwidth have limited performance!

Icaraeus · Oct 6, 2014

Oh right my mind completely blanked about the whole bandwidth part, I remember now! It puzzles me however why the GTX 970 and 980 have 256 bit buses for the VRAM when it's quite narrow and wouldn't be very good for anything over 2GB VRAM unless Nvidia did something to the architecture I haven't looked into (considering they pack 4GB of GDDR5 VRAM).

Pinhedd · Oct 6, 2014

chris987 :

Sorry, wrong.

The maximum theoretical floating point throughput is a function of the gross number of shaders.

I'll use the HD 7970 clocked at 1Ghz as an example because it's what I have.

The HD 7970 is constructed from 32 compute units.

Each compute unit has 4 SIMD execution units

Each SIMD unit is is 16 ALUs wide.

32 compute units * 4 SIMDs per compute unit * 16 ALUs per SIMD = 2048 total shaders

In each cycle, each shader can perform one multiply operation and one accumulate operation (called a MAC, or Multiply-Accumulate). Although this is executed as a single operation, it is considered to be two separate operations for the sake of computation.

2048 shaders * 2 floating point operations per cycle * 1 billion cycles per second = 4096 gigaflops

The same logic holds true for the R9-270X

Icaraeus · Oct 6, 2014

Pinhedd :

chris987 :

Sorry, wrong.

The maximum theoretical floating point throughput is a function of the gross number of shaders.

I'll use the HD 7970 clocked at 1Ghz as an example because it's what I have.

The HD 7970 is constructed from 32 compute units.

Each compute unit has 4 SIMD execution units

Each SIMD unit is is 16 ALUs wide.

32 compute units * 4 SIMDs per compute unit * 16 ALUs per SIMD = 2048 total shaders

In each cycle, each shader can perform one multiply operation and one accumulate operation (called a MAC, or Multiply-Accumulate). Although this is executed as a single operation, it is considered to be two separate operations for the sake of computation.

2048 shaders * 2 floating point operations per cycle * 1 billion cycles per second = 4096 gigaflops

The same logic holds true for the R9-270X

So if it isn't TMUs x ROPS x GPU Core Clock then what would it be exactly? I don't really understand the whole SIMD and ALU part.

Pinhedd · Oct 6, 2014

Icaraeus :

peak floating point throughput = shaders * 2 * clock frequency

In reality, hitting peak throughput is damned near impossible. AMD realized this when their older Radeon HD architectures (HD 6000 series and prior) had a theoretical computational edge over their competition at NVidia, yet generally underperformed in comparison. They radically changed the architecture to create GCN which enables them to do a much better job of keeping the shaders busy.

Whereas the HD 6000 series and prior were based on a VLIW MIMD design, GCN is based on a pure SIMD design.

Icaraeus · Oct 6, 2014

Thanks for the explanations Pinhedd and Chris987.

I did a quick check and TMUs x ROPS x Core Clock seems to be the same as Shaders (x2) x Core Clock, unless I'm missing something else.

Pinhedd · Oct 6, 2014

Icaraeus :

That's purely a result of the way AMD has the core laid out. It's just a coincidence.

chris987 · Oct 6, 2014

Pinhedd :

thanks for the info @Pinhedd, it seemed rather convenient though, i must say.! i havent really dig into that stuff! thanks again!

Pinhedd · Oct 6, 2014

chris987 :

In the classical fixed function graphical pipeline the number of TMUs, ROPs, Vertex Shaders, and Pixel shaders were typically the same. However, over time the complexity of the shader programs has grown far faster than the complexity of texture manipulation and rasterization. Thus, manufacturers have unified the shaders and decoupled them from the rest of the pipeline. Core configuration is usually expressed in the form Shaders:TMUs:ROPs.

chris987 · Oct 6, 2014

Pinhedd :

chris987 :

In the classical fixed function graphical pipeline the number of TMUs, ROPs, Vertex Shaders, and Pixel shaders were typically the same. However, over time the complexity of the shader programs has grown far faster than the complexity of texture manipulation and rasterization. Thus, manufacturers have unified the shaders and decoupled them from the rest of the pipeline. Core configuration is usually expressed in the form Shaders:TMUs:ROPs.

well, consindering that 3dfx voodoo 2 was the first that featured TMUs, things gotten rather complex now, hard to keep up!

Search

How To Calculate Theoretical GPU FLOPS?

Icaraeus

Honorable

Pinhedd

chris987

Judicious

Icaraeus

Honorable

chris987

Judicious

Icaraeus

Honorable

Pinhedd

Champion

Icaraeus

Honorable

Pinhedd

Champion

Icaraeus

Honorable

Pinhedd

Champion

chris987

Judicious

Pinhedd

Champion

chris987

Judicious

TRENDING THREADS

Latest posts

Moderators online

Share this page