gfaiii :

k1114 :

Flops is short for floating point operations per second so you don't want instructions per clock. Also remember to check if it's single or double precision and remember these are theoretical and real world performance will differ. You do have the formula right: cpu speed x cores x flops per cycle for the architecture.

Intel Haswell, Intel Broadwell and Intel Skylake: 16 DP FLOPs/cycle, 32 SP FLOPs/cycle

Why do you want gflops anyways?

I'm trying to calculate how long it should theoretically take to compute the product of two nxn matrices and then compare that with the experimental value I'm getting from MATLAB. I know that the amount of operations will be 2n^3. But i can't obtain a reliable gflops number to go off of. It's for a school project.

This is a tough question to answer because it's unknown as to whether or not the software that you're using to compute the matrix multiplication will vectorize the multiplication.

Skylake has one FPADD vector unit and one FPMUL vector unit, both units are also used for scalar arithmetic.

In theory you should be able to get one 256bit vector FPMUL per core per clock cycle, assuming perfect pipeline efficiency. However, the software needs to be compiled with AVX, if only SSE2 is compiled it will use a 128bit vector FPMUL.