This deserves an answer
juanrga :
hcl123 :
juanrga :
No. 4SP can be obtained with both FMA4 or SSE2. The difference is that one set provides 2 ops and the other only provides 1 op. It is evident what set is using the above computation. unless you confound both.
You are again confounding modules with cores. Kaveri APU comes with 2C and 4C configurations. It is the possibility of 6C which was droped.
No SSE like AVX1 has up to 2 operands in the same vector instruction, AVX 2 have up to 3 (FMA3) and XOP have up to 4 (FMA4)... the "numbers" that those operations operate in the context of SIMD, is dependent of the "wide" of the vector, but be assured the number of "numbers" is not the most useful ( SSE instructions can have up to 16 byte numbers or 8 16bit short integers "numbers" per vector), but what those operations can do is ... wikipedia it...
You are confounding modules with cores is you...
No. "op" and "ops" do not mean operands, but
operations. Learn the terminology before posting more nonsense.
You are being really helpful lol .. and i never said it does..but let me help you
Instruction set
http://en.wikipedia.org/wiki/Instruction_set
Operand
http://en.wikipedia.org/wiki/Operand
"Operands may be complex, and may consist of expressions also made up of operators with operands. "
" In computer programming languages, the definitions of operator and operand are almost the same as in mathematics."
In computing, an operand is the part of a computer instruction which specifies what data is to be manipulated or operated on, whilst at the same time representing the data itself. A computer instruction describes an operation such as add or multiply X, while the operand (or operands, as there can be more than one) specify on which X to operate as well as the value of X.
Additionally, in assembly language, an operand is a value (an argument) on which the instruction, named by mnemonic, operates. The operand may be a processor register, a memory address, a literal constant, or a label. A simple example (in the x86 architecture) is
MOV DS, AX
where the value in register operand 'AX' is to be moved into register 'DS'. Depending on the instruction, there may be zero, one, two, or more operands
Now everybody is enlighten LOL. And doubt you didn't know any of this, since was you that wrote that part of wikipedia lol
juanrga :
hcl123 :
Idiot use your brain. That "CHART" is for a Llano. Integer cluster/cores on AMD BD don't do FLOPS, or are you not sure ?
Here is the diagram of K10
https://pt.wikipedia.org/wiki/AMD_K10
https://upload.wikimedia.org/wikipedia/commons/thumb/d/d6/AMD_K10_Arch.svg/300px-AMD_K10_Arch.svg.png
LOL. You are embarrassing yourself more and more....
Good attempt to eliminate the image tags from my message to try to hide your ignorance, but I reintroduced the image tags. The
A10-5800k is not Llano, it is
Trinity, which is based in
Piledriver.
The integer cores don't do FLOPs, Floating Operations are made in the FPU. You were corrected about this before.
Trinity A10 (Piledriver) has 121.6 GFLOP (SP) for the CPU.
Kaveri A10 (Steamrolloer) has 128 GFLOP (SP) for the CPU.
Congrats!! You achieved hafidup levels of insanity. Now go away with your nonsense, trolling, and insults.
#$%% ...
i'm not registered, so upon posting the system doesn't allow me to post images only the links. (doubt how much you like, complain to the admin )
And yes i made a mistake, that is the primary motive i'm answering, that is Trinity not Llano,
but the rational is the same. Integer cores don't do FLOPS, so those "cores" mentioned in the chart can only be the "FMAC pipes" the only ones AFAIK can do FLOP operations... right ? And there are 4 of them (FMAC pipes) alright in Trinity in 2 modules.
Trinity can has in the Intruction Set
http://www.cpu-world.com/CPUs/Bulldozer/AMD-A10-Series%20A10-5800K.html
But that leaves a big problem... which instructions are executed that have 8 vector numbers per pipe !? If we consider "operands", that is "chunks" (values) that are to be operated upon, only FM4 has 4 operands, those "chunks" are defined by registers, that is, are "inside" what is defined as a register.
Those are
http://en.wikipedia.org/wiki/FMA_instruction_set
Mnemonic (AT&T) Operands................................. Operation
VFMADDPDx xmm, xmm, xmm/m128, xmm/m128 $0 = $1×$2 + $3
VFMADDPDy ymm, ymm, ymm/m256, ymm/m256
VFMADDPSx xmm, xmm, xmm/m128, xmm/m128
VFMADDPSy ymm, ymm, ymm/m256, ymm/m256
VFMADDSD xmm, xmm, xmm/m64, xmm/m64
VFMADDSS xmm, xmm, xmm/m32, xmm/m32
I looked around and i must confess i've not found anything clearly stated. A 128bit vector can only have 8x "values"/chunks" with 16bit "numbers" (8x16=128) and that leads to SSE2 16bit FP or XOP 16bit CVT16 FP... that is vectors of 8x 16bit chunks (pardon the language, so everybody understands)... The problem now is the number of "operand/registers", if the max that an operation (by mnemonic) can operate upon is determined by 4 registers, how 8 "chunks" are going to be operated at the same time on the same cycle ?
can you answer please ?
And i think in modern "processors" the only real logic operations that exist
"hard wired" are ADD, *SHIFT*, MULTIPLY and DIVIDE
(oops! an omission lol... nobody helps ? (EDITED) )..
please correct me... copy, move, shuffle, permute, pardon my presumption, are about moving values/numbers around inside and from registers to registers are not what i'd would call real logic operations.(edt)
I used wikipedia, but if you have anything that can enlighten us please share.
Streaming SIMD Extensions
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
SSE2
http://en.wikipedia.org/wiki/SSE2
Advanced Vector Extensions
http://en.wikipedia.org/wiki/Advanced_Vector_Extensions
SSE5 ( lead to XOP)
http://en.wikipedia.org/wiki/SSE5
XOP
http://en.wikipedia.org/wiki/XOP_instruction_set
Making a long story short,
8 FLOPS/core, assuming here its FMAC pipes, cause no other processing element in Trinity is able of FLOP operations (AFAIK)... and the FLexFPU in BD uarches is a co-processor(like a processor inside a processor)... and using a very loose definition with a lot of subsidies considering that in fact 8 FLOPS ( meaning 8 Floating Point numbers) is operated upon in each cycle by each FMAC pipe...
than it can only be possible with 16bit Floating Point numbers... no !??
See the dilemma ?...
do you understand ? ... and is this correct when those 8 "numbers" must reside inside 4 registers, which is the max stipulated by operation "mnemonic" and this for FMA4 ?
But if it is, and i'm bad and ignorant ( the opposite of you lol), then the only way i see it, is with vectors of 8x 16bit "chunks"( whatever... for ppl to understand.. and FP "chunks" since we are talking FLOPs)...
if there is something escaping me please correct;
Now that isn't terribly useful is it ?... unless we were using 16bit DOS Operating Systems and Applications that hardly deserves to be mentioned... even if not fundamentally wrong, because in a very tiny case 16bit FP operations are still used to perform some complementary calculations in programs ( is used in games).
There is more than 1 article, where one vendor raised suspicion about other vendor FLOP calculations... if this gets worst we will have a "fake FLOP war"... and in light of what i expose, and the complexity of the matter, i found no surprise that some vendors use slight different methods which might lead to some awkward charts even if those are not fundamentally incorrect.
If you on your infinite wisdom can correct me,
please do ( boy! more insult coming lol)... i don't feel embarrassed or find reasons to be, i don't even mind, if its the TRUTH... i'm all for the TRUTH.