AMD CPU speculation... and expert conjecture

palladin9479 · Dec 23, 2013

FALC0N :

I mentioned the L2 because some folks like to say that because it's shared at the module therefor it's not a "true core". Also RISC based systems don't require nearly as complex a front end as x86 does, I was referencing some really old super computer CPU's from the 80's and early 90's where they had shared MMU's and schedulers but separate processing units.

The concept of "core" has to do with some old engineering terms that have since been twisted into marketing terms, similar to the term "IPC". Core = "core processing engine / element" and is referencing the central part of the "CPU", back when CPU's consisted of many separate elements on different chips / cards. There had to be a way to differentiate between them all when programming for mainframes. Since the advent of superscalar computing there needs to be some definition that engineers can use to prevent miscommunications, that's what I used above. Independently externally addressable general processing element. It's the externally part that gets people, you can have as many internal units as you want, doesn't mean anything until you get outside the CPU from the software's PoV.

Also a note on instructions, we use the term "integer" but that doesn't mean the same thing for humans as it does machines. "Integer" is any instruction who's operand is a single number, either whole or with a fixed decimal place. That is because CPU's only know binary math and everything is just a combination of adding, subtracting or comparing 1's and 0's, there is an assumption at the hardware level on how long each operand is. So integer = vector = general purpose computing instructions. Floating Point Coprocessors where special dedicated chips that could take variable length integer operands at the hardware level, up to 80-bits long in the case of the 8087. At that time there was very little need (even now there is less need) for a floating point integer operation, only mathematics that require extreme precision. Eventually the role of the FP Coprocessor was assumed by a vector coprocessor, we kept the name but the mechanics have dramatically changed.

That long bit is required to understand why a shared scalar SIMD coprocessor (FPU) isn't a big deal when defining whether something is a "core" or not. It's a coprocessor not a general purpose processor. So it's down to the number of externally addressable independent integer units, which is four for Intel i5/i7 and eight for fx8xxx. HTT doesn't work as it's not an independent integer unit but a set of three integer units that share two register stacks with a single scheduler / decoder unit.

de5_Roy · Dec 23, 2013

@palladin9479: i'm always second guessing myself when i try to address fx cpus with 'cores', so i changed to thread executing capability (from software pov...ish). if the module can execute 2 threads (broad definition) at a time, it's just that. i am still not sure that i should continue using that... is it correct?

FALC0N · Dec 23, 2013

Well Palladin, that's as good as any argument that I have read for definition of "core". You must go back quite a ways to remember 80's mainframes. I don't feel quite so old anymore.

os2wiz · Dec 23, 2013

palladin9479 :

-Fran- :

x86-64 includes x64-32. AMD didn't design a new ISA, they designed a set of extensions of the x86 ISA to make it 64 bit capable similar to what SSE is. This is more akin to the 3dnow vs MMX vector extensions of old but with AMD being the one to make it to market first. AMD has a perpetual license to the x86 instruction set due to a court settlement a long time ago. Intel has access to the AMD 64-bit extensions as a result of another settlement where they agreed to give each other unrestricted access to any x86 related ISA that each makes, thus AMD got SSE3/4 and eventually AVX while Intel got x86-64 and anything HSA related (if AMD didn't make it an open standard, which they did).

Anyhow the 8xxx series is indeed an eight core CPU by the actual definition of a core. It has eight independently externally addressable general purpose processing units. The cores do not share ALU's which are the centrally addressable component of general purpose processors. The SIMD FPU's in x86 are technically co-processors with separate registers, stacks and pointer, they just happen to exist on the same die. L2 cache has been shared amongst cores before so that's nothing new. The only controversial part is them building one large front end decoder / scheduler for each module vs two independent decoders / schedulers. This isn't unheard of or new, but it's not something you typically do in a consumer orientated general purpose processor as getting it right requires the code to be tightly integrated with the metal. From what I've heard SR did quite a bit of work up front to widen up this part.

The part was not "widened" , an additional decder was added per module. That is what AMD stated from the get go for steamroller desgn.

noob2222 · Dec 23, 2013

AMD's cores in a module is more how much of a core is actually there as well as the performance.

1 module is 80% performance of a dual core cpu. This is while both halves of the module are active and has been tested accurately to be 80-85%. Take the rounding method, 1.6-1.7 rounds up to 2.

Intel's ht is 1.3 at best so its rounded down.

juggernautxtr · Dec 23, 2013

noob2222 :

it should actually do better than this, but because of the front end in Zambezi,Piledriver are choking for input, the severely restrictive 2 lane per core decode hampering most of their performance in both multi-threading and single core performance. and a bad memory controller doesn't help either.

juanrga · Dec 23, 2013

Despite Palladin beliefs, there is no universal definition of core, as I said before.

The lack of agreement is evident in software. Some software reports that both i7-4770k and FX-8350 have 4 cores and 8 threads; we saw an example recently with the Geekbench scores. Other software reports that both i7-4770k and FX-8350 have 8 cores and 8 threads. The software that I used for my Kaveri article makes exactly this last thing.

Therefore, each one can use the definition that he or she likes more. What is unacceptable is Intel fanboys using one definition for AMD and another for Intel.

de5_Roy · Dec 23, 2013

An Interview with Amit Mookerjee on AMD’s Media SDK
Making heterogeneous compute easy to use...
http://semiaccurate.com/2013/12/23/interview-amit-mookerjee-amds-media-sdk/

sharkoon launches Bulldozer case! and promptly gets an award from eteknix just like the other one!
http://www.eteknix.com/sharkoon-bulldozer-atx-chassis-review/
i like the case better.

juanrga · Dec 23, 2013

noob2222 :

^^^ Fixed it for you. A Steamroller module eliminates the ~20% bottleneck introduced by the shared decoder and may provide ~95% of the performance of a dual CMP core CPU.

juanrga · Dec 23, 2013

palladin9479 :

-Fran- :

x86-64 includes x64-32. AMD didn't design a new ISA, they designed a set of extensions of the x86 ISA to make it 64 bit capable similar to what SSE is. This is more akin to the 3dnow vs MMX vector extensions of old but with AMD being the one to make it to market first. AMD has a perpetual license to the x86 instruction set due to a court settlement a long time ago. Intel has access to the AMD 64-bit extensions as a result of another settlement where they agreed to give each other unrestricted access to any x86 related ISA that each makes, thus AMD got SSE3/4 and eventually AVX while Intel got x86-64 and anything HSA related (if AMD didn't make it an open standard, which they did).

I have revised my copy of the settlement and clearly says that the cross-licensing agreement finishes the day 12 of November of 2014.

palladin9479 · Dec 23, 2013

de5_Roy :

Well the user software doesn't run on "cores", it just creates threads of stuff to do. It's the OS that actually puts code onto CPU targets in a process called scheduling. Processors simply present themselves as a set of registers and used to be that every register stack was a separate processor. Modern systems have blurred the lines quite a bit with SMT and the OS needs to do a bit more work to optimize work loads. Intel's technology is very simple, it just exposed a second register stack to the OS and has an instruction scheduler that can keep track of two separate instruction streams. From the CPU's point of view code is just a long stream of binary instructions segmented into 16, 32, 64, 128 or even 256 bit chunks, so having two separate register stacks would allow the process to track two separate streams of code. AMD used a more complicated way where they actually duplicated processor resources, namely the ALU's which do nearly all the work. In doing so they used one less ALU per core then previously though most every uArch in the industry use's 2 ALU's per core (Power and SPARC are both 2 integer units per core though they have additional dedicated units for other types of instructions) though saved enough die space to fit 16 ALU's worth of power on a die at $200 or less. Intel use's a more robust 3 ALU per core design that takes up more space but results in faster single thread performance. HT allows for Intel to squeeze out additional performance from underutilized processor resources and in essence allow each of it's cores to act as two 1.5 ALU cores.

Hell Intel and AMD aren't even really doing extreme SMT. Power7 does 4 threads per core, SPARC T4/T5 does 8 threads per core along with Power8. A Power8 CPU is 650 mm2 in size, Power7 is 567 mm2, SPARC T4 at 403 mm2 and SPARC T5 at 478mm2. Compared to a fx8350 at 319 mm2 and Intel HW-DT i7 at 177 mm².

Consumer CPU's are tiny compared to the big iron boys, that is what surprised me the most about AMD's design, they really did try to implement a big iron like uArch in the consumer space, didn't go over well for obvious reasons.

Anyhow the whole idea behind what I originally wrote is to understand what a performance profile is and how it effects over all system capacity. Since we're talking games we gotta see what their profile is like, typically two threads at full or near full utilization and then dozens of small threads with minor utilization. All those minor threads quickly add up and can consume an entire core or sometimes more (frostbite engine) depending. Knowing the profile lets you pick hardware that match's it the greatest for the least amount of underutilized capacity. That's actually AMD's biggest problem, their design simply has too much underutilized capacity.

juggernautxtr · Dec 23, 2013

I have never been a fan of bench testing, compiler always being a factor, that it is the only thing running besides the OS, real world says I/me has more than 3-4 programs running while playing a game or other program., to many things rule out bench testing as a real test of cpu power. I would like to see results that resemble more realistic use.
yeah it's a base line, but doesn't really show me what gonna happen when i actually start use.
and from my experience AMD has been more capable of running better with multiple programs running as with Intel i notice massive lags.

FALC0N · Dec 23, 2013

I'm inclined to agree. Especially on the 6 and 8 core designs, where the processing power increases as the workload widens.

blackkstar · Dec 23, 2013

What do you folks think the feasibility is of AMD bluffing and saying they can go ARM only to gain leverage when negotiating renewing x86 licensing?

Imagine when Intel and AMD lawyers and techs sit down to talk about licensing, and AMD comes out and goes "well, we'd really like access to these patents that are hampering our x86 performance because we have to work around them, but if you don't give them to us, it's fine, we can just go pure ARM and you can either pay us a ton of money for amd64 instruction set or you can just go back to making 32-bit CPUs, or you can make your own 64-bit system and break compatibility with every piece of x86 amd64 software out there."

Also, if anyone wants to show me a roadmap that's more than 6 months old that shows Hawaii GPU and Mantle, please share it with me. I'd expect some of you who are taking these roadmaps at face value to be able to prove to me that AMD roadmaps have a strong enough amount of accuracy that we can take these at face value.

FALC0N · Dec 23, 2013

Blackstar, what does amd need access that it doesn't already have?

juggernautxtr · Dec 23, 2013

^^ AMD isn't going arm,they are partnering with arm,heterogeneous programming, they will gain advantage over intel if amd plays the cards right.

palladin9479 · Dec 23, 2013

What do you folks think the feasibility is of AMD bluffing and saying they can go ARM only to gain leverage when negotiating renewing x86 licensing?

AMD isn't ever lose x86 licensing. They gained an indefinite license to x86 from a big court case back in the early 90's that raged for a few years. It was during the 386 era when AMD was producing intel x86 clones. Intel argued that the original IBM forced x86 license applied only to the 8086 processor, AMD argued that since it was an entire instruction set they could use it to produce other CPU's. The courts sided with AMD and granted them an indefinite license on the x86 ISA though they also said that AMD could not utilize Intel brand names for their products, and thus the Am386 was the last CPU named in a similiar way to Intel.

That cross licensing was about the various extensions that have been added to the ISA, MMX, SSE, AVX and AMD's own x86-64. They had been in a fight over whether those extensions were covered in the original 90's court agreement since the court settlement stipulated the x86 specifications and never mentioned those extensions. They just decided to drop the fight entirely as it wasn't worth the effort to either side. Any wins Intel might of made about restricting access to SSE / AVX would of cost them from losing EMT64, any wins AMD might of made from kicking Intel out of the x64 game would of cost them from losing access to SSE / AVX. So ultimately they just agreed to allow each other access to those extensions and to drop the court battle. Nothing has changed since then, the both still stand to lose far more then they could gain and the most beneficial outcome would be to maintain the status-quot.

Ags1 · Dec 24, 2013

palladin9479 :

FALC0N :

I mentioned the L2 because some folks like to say that because it's shared at the module therefor it's not a "true core". Also RISC based systems don't require nearly as complex a front end as x86 does, I was referencing some really old super computer CPU's from the 80's and early 90's where they had shared MMU's and schedulers but separate processing units.

The concept of "core" has to do with some old engineering terms that have since been twisted into marketing terms, similar to the term "IPC". Core = "core processing engine / element" and is referencing the central part of the "CPU", back when CPU's consisted of many separate elements on different chips / cards. There had to be a way to differentiate between them all when programming for mainframes. Since the advent of superscalar computing there needs to be some definition that engineers can use to prevent miscommunications, that's what I used above. Independently externally addressable general processing element. It's the externally part that gets people, you can have as many internal units as you want, doesn't mean anything until you get outside the CPU from the software's PoV.

Also a note on instructions, we use the term "integer" but that doesn't mean the same thing for humans as it does machines. "Integer" is any instruction who's operand is a single number, either whole or with a fixed decimal place. That is because CPU's only know binary math and everything is just a combination of adding, subtracting or comparing 1's and 0's, there is an assumption at the hardware level on how long each operand is. So integer = vector = general purpose computing instructions. Floating Point Coprocessors where special dedicated chips that could take variable length integer operands at the hardware level, up to 80-bits long in the case of the 8087. At that time there was very little need (even now there is less need) for a floating point integer operation, only mathematics that require extreme precision. Eventually the role of the FP Coprocessor was assumed by a vector coprocessor, we kept the name but the mechanics have dramatically changed.

That long bit is required to understand why a shared scalar SIMD coprocessor (FPU) isn't a big deal when defining whether something is a "core" or not. It's a coprocessor not a general purpose processor. So it's down to the number of externally addressable independent integer units, which is four for Intel i5/i7 and eight for fx8xxx. HTT doesn't work as it's not an independent integer unit but a set of three integer units that share two register stacks with a single scheduler / decoder unit.

I thought that core really emerged as a term when intel and AMD started putting 2 CPUs on one die. Back when all consumer chips were single core, no-one talked about cores...

palladin9479 · Dec 24, 2013

Ags1 :

palladin9479 :

I thought that core really emerged as a term when intel and AMD started putting 2 CPUs on one die. Back when all consumer chips were single core, no-one talked about cores...

The modern version of the word only came about when Intel and AMD decided to bolt two CPU's side by side on a single die. It was a marketing term used to state that it was the same as having two CPU sockets which was a common theme for workstations and higher end servers. I was referring to it's original meaning in the engineering world which predates the marketing term. It denotes a processing unit which has the basic requirements to process data which are a control unit, memory storage unit and some sort of math / logic execution unit (typically Integer). Other components like external I/O bus's and cache didn't always exist so close to the CPU.

http://en.wikipedia.org/wiki/File:Z80_arch.svg

Those three things on the Z80 CPU.

The first microprocessor was the MP944 which was a bunch of different chips and components that worked together. It was referred to as the "core CPU" of the F-14 though it's specs weren't published until the late 90's. That terminology "core processor", "core CPU", "core microprocessor" was used extensively when designing the old mainframes / super computers. Intel and AMD just reused the word with a marketing spin on it.

juanrga · Dec 24, 2013

blackkstar :

I believe that both AMD and Intel will prorogue the current x86 cross-licensing agreement another five years. Things will be different about 2020... AMD already stated that ARM will win and that x86 CPUs like their new Warsaw are only aimed to customers slow on migrating to ARM. The move is on Intel side: (i) Remain in x86 as niche market, (ii) Develop a new x86, (iii) Get an ARM license.

Option (i) is unlikely, because Intel couldn't feed their foundries and maintain their giant inversion. In fact, the current demand for x86 is so low that Intel has been forced to open its foundries to ARM chips from the competence. Intel is going to fabricate ARM64 heterogeneous chips for Altera on the new 14nm FinFET process.

http://semiaccurate.com/2013/11/05/intel-fabs-alteras-stratix-10-fpga-four-arm-a53-cores/

Option (ii) is unlikely. Intel and HP tried that with Itanium and it didn't work. Why would it work now?

Option (iii) seems the only option.

blackkstar · Dec 24, 2013

FALC0N :

A lot of technologies are patented and AMD and Intel already share a lot of them. Take a gander at the USPTO website.

I don't see if often discussed, but designing an architecture is a lot of dancing around patents. I recall reading a website somewhere stating that a lot of AMD's memory controller issues stem from Intel patenting the only good way of doing certain things in a memory controller and AMD having to do the less than optimal thing in order to avoid patent issues.

Those are the kinds of things I would think AMD would want to be going for by bargaining that they don't need to negotiate with Intel over giving them prolonged access to amd64 instruction set and that they'd be fine with ARM. But knowing that AMD has a perpetual x86 license is great news which gives them even more bargaining power. I'm not a lawyer or anything but I kind of understand the situation. But I am somewhat aware of the fact that designing an architecture isn't just about making the fastest thing you can, it's about not having to pay ridiculous fees to license things from your competitors.

juanrga · Dec 24, 2013

blackkstar :

Intel also owns some relevant cache patents 🙁

blackkstar :

Which is untrue, the x86 license finishes the day 12 of Nov of 2014.

FALC0N · Dec 24, 2013

Not true juranga. As Palladin correctly stated earlier, AMD was granted a perpetual license to X86 as a result of a 1995 settlement.

The expiration your talking about is likely subsequent related technologies that intel has as much interest in keeping as AMD.

Kulasko · Dec 24, 2013

I wonder why they are talking so much about their graphics while not mentioning their upcoming Steamroller core with a single word

http://www.youtube.com/watch?v=QIWyf8Hyjbg

Has anyone already seen this?
Mantle might be more for CPU performance rather than GPU performance

juggernautxtr · Dec 24, 2013

Kulasko :

this program is for both optimizing cpu and gpu usage in games, it's an api like directx.
directx is extremely top heavy and ends up using up uneedfully, cpu processing power along with gpu processing, mantle gives games developers more control of the system.

AMD CPU speculation... and expert conjecture

Splendid

Splendid

Splendid

Distinguished

Distinguished

Honorable

Distinguished

Splendid

Distinguished

Distinguished

Splendid

Honorable

Splendid

Honorable

Splendid

Honorable

Splendid

Honorable

Splendid

Distinguished

Honorable

Distinguished

Splendid

Honorable

Honorable

Share this page