AMD CPU speculation... and expert conjecture

jdwii · Aug 6, 2014

palladin9479 :

Cazalan :

Ok something you gotta realize is that all FX chips are the exact same die, same with all APU's. A 7850K costs the exact same as a 860K or an A6-7400K to make. AMD then sorts the chips by quality (binning) and sells them accordingly. Because of this method the margin on one chip (fx6300) will be different then another (fx8350) even though they cost the exact same to make. This is also why they went with the modular approach, it dramatically cut the costs of production.

Well i'm not to sure that is why they went with the modular approach but yeah the rest is true to bad we couldn't unlock these chips like in the phenom days

Cazalan · Aug 6, 2014

palladin9479 :

I'm aware, but they still lump that into one average number in the earnings call.

gamerk316 · Aug 7, 2014

palladin9479 :

I just know you could do it, but forget how it actually worked. I think the deal was only one could be the primary at a time, kind of like audio devices are now. But you could physically install two display drivers at the same time, which is why cross vendor PhysX worked [NVIDIA card as the PhysX device, AMD as the renderer] before NVIDIA disallowed that via drivers.

That being said, there's really no reason both AMD and NVIDIA can't release a non-display driver suite to allow their cards to be used as gpGPUs without needing to be the main renderer. But we're right back to needing a way to point the software to which card(s) to use for what tasks, so I can understand the reluctance. For example, maybe you want to use a secondary card for OpenCL tasks for non-gaming, but for games, for stability, you need the primary card to do it. How the heck do you pull that off easily?

Really comes down to a lot of this stuff not being modular on the OS side. And no, Linux isn't much better here either.

de5_Roy · Aug 7, 2014

asus crossblade formula: r.o.g. comes to socket fm2+
http://www.tomshardware.com/news/asus-rog-crossblade-formula-amd,27385.html

jacobian · Aug 7, 2014

What's up with AMD's Kaveri rollout? Let's admit one painful truth. Not only AMD failed to deliver on the performance goals that were promised for its new Kaveri APUs, but AMD also somehow failed to deliver the actual chips.

In January, the tech news sites and web forum arm chair analysts sort of drooled over the new A8-7600 APU for its low TDP and decent overall performance. Yet, eight months later, we still don't see those sold anywhere.

(story is here: http://www.extremetech.com/computing/180207-perils-of-a-paper-launch-amds-a8-7600-pushed-back-to-late-2014)

To add insult to injury, Intel taunts AMD by releasing the unlocked dual-core Pentium G3258, which is designed specially to appeal to the enthusiasts. Instead of buying a Kaveri A10, get the Pentium and a cheap GPU for the same money, and you got a monster budget alternative relative to the A10.

(story here: http://techreport.com/review/26735/overclocking-intel-pentium-g3258-anniversary-edition-processor/4)

de5_Roy · Aug 7, 2014

jacobian :

amd did deliver what they hyped in their roadmap. steamroller was about higher parallelism. as a result, kaveri introduced hsa and 12 compute cores (12 is more than 8 cores in fx cpus). excavator is supposed to deliver more performance. so far, the only leaks are about mobile apu/socs. amd will have no problem delivering higher performance or efficiency on mobile thanks to the way kaveri turned out. performance is once again the low hanging fruit.

i've always suspected glofo's issues with kaveri even though the 28nm shp process delivered impressive transistor density. i wonder if this led to higher leakage and caused most apus underperform in perf/watt binning.

szatkus · Aug 7, 2014

de5_Roy :

Please, don't count compute cores. It's some kind of marketing buzzword.

Also, I don't remember AMD promising anything. There's always annotation in the end of every presentation that details may change in future.

jacobian · Aug 7, 2014

^

Well de5_roy, that's what I am writing in my post. After all that hype, AMD is almost saying: "Sorry folks. A real improvement in performance actually comes a year later". And those, the extra compute cores won't do jack "it" in most applications. Clearly, the whole Bulldozer architecture was designed to maximize the number of cores rather than per core performance. Maybe this would work well for server applications, but it's an ongoing challenge to actually make use of those numerous cores on a desktop or laptop. Personally, I have lost hope for Bulldozer-based architectures to produce something interesting. I will be waiting for the K12, or maybe not.

http://techreport.com/news/26417/amd-is-working-on-k12-brand-new-x86-and-arm-cores

cdrkf · Aug 7, 2014

jacobian :

Well the K12 is an ARM core aimed at servers. There is an x86 core (as yet un-named) that is being developed 'in parallel' so that is what will interest us. In fairness they've got Keller heading up the R&D on this one so I think there's a good chance it will be a big improvement over AMDs current offerings

How it will compare to other offerings when it's released who knows.

blackkstar · Aug 7, 2014

jacobian :

You are going to be a lot happier when you realize that we've hit the end of massive single thread performance improvements on x86. We don't have anywhere else to go in frequency. Intel adding entire math units to Haswell barely increased performance.

The only way for x86 to grow is through more cores. Bulldozer does have far less single thread than it should have, but if you think we're going to get some sort of single thread beast that blows out what we have now, you're insane. Those days are long gone and everyone knows it. It's why we have quad core CPUs right now and Intel is actually going as far as sliding down the price of a 6 core CPU while coming out with an 8 core Intel CPU for consumers.

If you're expecting K12 x86 core to be some sort of super fast quad core monster you're going to be disappointed. I'm expect at best around Haswell level of single thread with 16 cores.

People spend so much time running around the internet going "LOL MOAR COARS HAHAHAHA" when that's all we have left. ARM is doing it too.

The challenge will always be there. It was there when we went from single core to dual core, dual to quad, and now quad to octo. It will be there when we move from octo to 16 core. We aren't getting significant single thread performance increases ever again so you might as well give up on that dream. All we're going to get are more cores and HSA/GPGPU and developers being dragged along going "but but I don't know what I'm doing with all these cores!"

It's really easy to sit on a forum and go "wow all AMD needs to do is improve single thread!" but Intel has been spending tons of money on it and they haven't gotten anywhere significant since Nehalem. So they chase power efficiency because they have no way to increase single thread by a decent amount.

AMD's next x86 architecture will be focused around coming close to Intel single thread while cramming as many cores and threads as possible onto a CPU by keeping die size down so AMD can undercut Intel in situations where multiple cores are appreciated. There are people who do need lots of threads and while you obviously don't need to use them, you will have budget options available in the future. Nothing irks me more than someone who doesn't need a lot of cores running around going "more cores are pointless! When I play Skyrim only 2 of my cores are used!" Some markets really, really need those cores and some people have 5ghz 8 core AMD CPUs that do nothing but sit around and wait for it to compute things all day long at 100% CPU usage.

"I don't need multiple cores and even though AMD, Intel, and every ARM vendor do things that indicate the only way they can scale performance is to add cores I am going to be an armchair analyst and proclaim that all everyone needs to do is increase single thread performance! It's that easy! We only get more cores and focuses on efficiency over raw performance because everyone CHOOSES to do it! Forced to do it because they can't increase single thread performance? HAHAHA YEAH RIGHT!!! LOL" That's what some of you sound like. If you think that increasing single thread performance is so easy, you can go work for Intel, AMD, ARM, MediaTek, Qualcomm, Apple, Samsung, Nvidia and blow everyone's minds by how easy it is to increase single thread performance and how they have all been missing such easy, low hanging fruit!

szatkus · Aug 7, 2014

blackkstar :

IPC progress was more less the same past 10 years. Also just think that it would be completely possible to make Haswell on 32nm (with smaller GPU). We have a lot of transistors which could be used to bump single thread performance. They are simply more focused on GPU and energy efficency.

blackkstar :

What? Even Jaguar doesn't need 16 cores to beat one Intel.

blackkstar :

That's why A57 is more less two times bigger than A15?

gamerk316 · Aug 7, 2014

blackkstar :

jacobian :

You are going to be a lot happier when you realize that we've hit the end of massive single thread performance improvements on x86. We don't have anywhere else to go in frequency. Intel adding entire math units to Haswell barely increased performance.

The only way for x86 to grow is through more cores. Bulldozer does have far less single thread than it should have, but if you think we're going to get some sort of single thread beast that blows out what we have now, you're insane. Those days are long gone and everyone knows it. It's why we have quad core CPUs right now and Intel is actually going as far as sliding down the price of a 6 core CPU while coming out with an 8 core Intel CPU for consumers.

If you're expecting K12 x86 core to be some sort of super fast quad core monster you're going to be disappointed. I'm expect at best around Haswell level of single thread with 16 cores.

People spend so much time running around the internet going "LOL MOAR COARS HAHAHAHA" when that's all we have left. ARM is doing it too.

The challenge will always be there. It was there when we went from single core to dual core, dual to quad, and now quad to octo. It will be there when we move from octo to 16 core. We aren't getting significant single thread performance increases ever again so you might as well give up on that dream. All we're going to get are more cores and HSA/GPGPU and developers being dragged along going "but but I don't know what I'm doing with all these cores!"

It's really easy to sit on a forum and go "wow all AMD needs to do is improve single thread!" but Intel has been spending tons of money on it and they haven't gotten anywhere significant since Nehalem. So they chase power efficiency because they have no way to increase single thread by a decent amount.

AMD's next x86 architecture will be focused around coming close to Intel single thread while cramming as many cores and threads as possible onto a CPU by keeping die size down so AMD can undercut Intel in situations where multiple cores are appreciated. There are people who do need lots of threads and while you obviously don't need to use them, you will have budget options available in the future. Nothing irks me more than someone who doesn't need a lot of cores running around going "more cores are pointless! When I play Skyrim only 2 of my cores are used!" Some markets really, really need those cores and some people have 5ghz 8 core AMD CPUs that do nothing but sit around and wait for it to compute things all day long at 100% CPU usage.

"I don't need multiple cores and even though AMD, Intel, and every ARM vendor do things that indicate the only way they can scale performance is to add cores I am going to be an armchair analyst and proclaim that all everyone needs to do is increase single thread performance! It's that easy! We only get more cores and focuses on efficiency over raw performance because everyone CHOOSES to do it! Forced to do it because they can't increase single thread performance? HAHAHA YEAH RIGHT!!! LOL" That's what some of you sound like. If you think that increasing single thread performance is so easy, you can go work for Intel, AMD, ARM, MediaTek, Qualcomm, Apple, Samsung, Nvidia and blow everyone's minds by how easy it is to increase single thread performance and how they have all been missing such easy, low hanging fruit!

Except, again, as was discovered in the late 70s, you can't use more then a handful of cores to work on general computing tasks. You simply won't see the performance gain for the vast majority of tasks out there. So adding cores is also a dead end.

Hence why I suspect we're going to hit "peak computing" in 3-4 years, where we essentially run out of ways to improve performance through HW. At that point, we're all going to be waiting form Quantum computing to become viable.

szatkus · Aug 7, 2014

gamerk316 :

Oh, don't forget about Graphene/Silicene/CNT/Whatever.

jdwii · Aug 7, 2014

szatkus :

Lol no more or less the same is like -5%-+5%. I7 920 vs I7 3770K which haswell is even faster in dolphin emulator over IPC improvements, I7 920 is 15% less efficient in IPC compared to Ivy and probably closer to 20% slower per clock compared to Haswell. That is from 2008-2014 only 6 years would you like me to pull out some horrible Pentium 4 IPC benchmarks(from 04) i hope not. Amd on the other hand has been around the same.
http://alienbabeltech.com/main/platform-upgrade-core-i7-920-vs-i7-3770-at-4-2ghz-featuring-ecs-golden-series-motherboard-and-kingston/7/
http://www.anandtech.com/show/2658/16
I7 965 was around 10%-15%(some cases 30%) faster ipc IPC compared to the Intel core 2 extreme Qx 93770 at the same clock rates

Then we can see the improvements on the quad core extreme vs Pentium dual core from 05 sure its around 10-15% as well. Overall from 04-14 we saw a solid 40-50% boost in IPC compared to a pentium 4 extreme and a I7 4770K.
We can also see sandy to haswell is around 10-15% faster in IPC

http://www.hardocp.com/article/2013/06/01/intel_haswell_i74770k_ipc_overclocking_review/1#.U-O-2fldXJY

I'm not to sure how much longer these core improvements will continue.

Cazalan · Aug 7, 2014

gamerk316 :

Nah there are plenty of engineering opportunities to increase compute power. Finding ways to get people to buy into it will be the hard part.

gamerk316 · Aug 7, 2014

szatkus :

Have to turn it into a product. Graphine's lack of a natural bandgap is a major problem, for instance. Other materials are expensive to produce. And so on.

gamerk316 · Aug 7, 2014

What would we ever do without IBM?

http://arstechnica.com/science/2014/08/ibm-researchers-make-a-chip-full-of-artificial-neurons/
http://www.extremetech.com/extreme/187612-ibm-cracks-open-a-new-era-of-computing-with-brain-like-chip-4096-cores-1-million-neurons-5-4-billion-transistors

Zircoben · Aug 7, 2014

blackkstar :

jacobian :

You are going to be a lot happier when you realize that we've hit the end of massive single thread performance improvements on x86. We don't have anywhere else to go in frequency. Intel adding entire math units to Haswell barely increased performance.

The only way for x86 to grow is through more cores. Bulldozer does have far less single thread than it should have, but if you think we're going to get some sort of single thread beast that blows out what we have now, you're insane. Those days are long gone and everyone knows it. It's why we have quad core CPUs right now and Intel is actually going as far as sliding down the price of a 6 core CPU while coming out with an 8 core Intel CPU for consumers.

If you're expecting K12 x86 core to be some sort of super fast quad core monster you're going to be disappointed. I'm expect at best around Haswell level of single thread with 16 cores.

People spend so much time running around the internet going "LOL MOAR COARS HAHAHAHA" when that's all we have left. ARM is doing it too.

The challenge will always be there. It was there when we went from single core to dual core, dual to quad, and now quad to octo. It will be there when we move from octo to 16 core. We aren't getting significant single thread performance increases ever again so you might as well give up on that dream. All we're going to get are more cores and HSA/GPGPU and developers being dragged along going "but but I don't know what I'm doing with all these cores!"

It's really easy to sit on a forum and go "wow all AMD needs to do is improve single thread!" but Intel has been spending tons of money on it and they haven't gotten anywhere significant since Nehalem. So they chase power efficiency because they have no way to increase single thread by a decent amount.

AMD's next x86 architecture will be focused around coming close to Intel single thread while cramming as many cores and threads as possible onto a CPU by keeping die size down so AMD can undercut Intel in situations where multiple cores are appreciated. There are people who do need lots of threads and while you obviously don't need to use them, you will have budget options available in the future. Nothing irks me more than someone who doesn't need a lot of cores running around going "more cores are pointless! When I play Skyrim only 2 of my cores are used!" Some markets really, really need those cores and some people have 5ghz 8 core AMD CPUs that do nothing but sit around and wait for it to compute things all day long at 100% CPU usage.

"I don't need multiple cores and even though AMD, Intel, and every ARM vendor do things that indicate the only way they can scale performance is to add cores I am going to be an armchair analyst and proclaim that all everyone needs to do is increase single thread performance! It's that easy! We only get more cores and focuses on efficiency over raw performance because everyone CHOOSES to do it! Forced to do it because they can't increase single thread performance? HAHAHA YEAH RIGHT!!! LOL" That's what some of you sound like. If you think that increasing single thread performance is so easy, you can go work for Intel, AMD, ARM, MediaTek, Qualcomm, Apple, Samsung, Nvidia and blow everyone's minds by how easy it is to increase single thread performance and how they have all been missing such easy, low hanging fruit!

That was fascinating.

But, really, do you believe that? I don't see any reason single-thread performance is at a standstill, because it has steadily gone upward since the beginning of CPUs.

Haswell did improve single thread performance, and as the size of the process technology drops lower and lower (14nm, 10 nm, etc.) we will only be able to fit more processing components and transistors, really.

I don't think it will stop, and even as the industry leans toward more and more cores, the single thread performance will only improve as the techology improves.

szatkus · Aug 7, 2014

jdwii :

szatkus :

Lol no more or less the same is like -5%-+5%. I7 920 vs I7 3770K which haswell is even faster in dolphin emulator over IPC improvements, I7 920 is 15% less efficient in IPC compared to Ivy and probably closer to 20% slower per clock compared to Haswell. That is from 2008-2014 only 6 years would you like me to pull out some horrible Pentium 4 IPC benchmarks(from 04) i hope not. Amd on the other hand has been around the same.
http://alienbabeltech.com/main/platform-upgrade-core-i7-920-vs-i7-3770-at-4-2ghz-featuring-ecs-golden-series-motherboard-and-kingston/7/
http://www.anandtech.com/show/2658/16
I7 965 was around 10%-15%(some cases 30%) faster ipc IPC compared to the Intel core 2 extreme Qx 93770 at the same clock rates

Then we can see the improvements on the quad core extreme vs Pentium dual core from 05 sure its around 10-15% as well. Overall from 04-14 we saw a solid 40-50% boost in IPC compared to a pentium 4 extreme and a I7 4770K.
We can also see sandy to haswell is around 10-15% faster in IPC

http://www.hardocp.com/article/2013/06/01/intel_haswell_i74770k_ipc_overclocking_review/1#.U-O-2fldXJY

I'm not to sure how much longer these core improvements will continue.

You should compare it with Pentium M. Pentium 4 was completely different thing.

szatkus · Aug 7, 2014

gamerk316 :

szatkus :

Have to turn it into a product. Graphine's lack of a natural bandgap is a major problem, for instance. Other materials are expensive to produce. And so on.

That's why I gave few options.
Also there was news some time ago that they resolved that problem with bandgap.

gamerk316 · Aug 7, 2014

szatkus :

Kind of:

http://spectrum.ieee.org/nanoclast/semiconductors/nanotechnology/a-simple-twist-changes-graphenes-fate

Now, researchers at the Lawrence Berkeley National Laboratory in California and the Fritz Haber Institute in Berlin have discovered why these engineered band gaps in graphene don’t measure up to expectations. It turns out that when monolayers of graphene are stacked to create bilayers, they are slightly misaligned, resulting in a twist that changes the bilayer graphene’s electronic properties.

“The introduction of the twist generates a completely new electronic structure in the bilayer graphene that produces massive and massless Dirac fermions,” Aaron Bostwick, a scientist at Berkeley Lab’s Advanced Light Source (ALS), said in a press release. “The massless Dirac fermion branch produced by this new structure prevents bilayer graphene from becoming fully insulating even under a very strong electric field. This explains why bilayer graphene has not lived up to theoretical predictions in actual devices that were based on perfect or untwisted bilayer graphene.”

Massless Dirac fermions are essentially electrons that act as if they were photons. As a result, they are not restricted by the same band gap constraints as conventional electrons.

So Graphene is back on the drawing board, since the proposed way to create the bandgap doesn't work.

palladin9479 · Aug 7, 2014

Zircoben :

Depends on who you ask. Industry absolutely loves "moar cores" because 99% of what we do is parallel work. You expand capacity by adding more nodes and having more cores per node allows each node to have more capacity. We're already talking about using ESXi clusters that contain four to ten nodes with each node having two six Intel core HT CPU's. Our SPARC systems running specialty software has 128 threads per physical server with multiple servers working together. So all this wide expansion is perfect for industry which is where the real money is. Consumer computing is shrinking due to mobile taking on many of the common functions like communication and media consumption. That leaves office automation and interactive entertainment (gaming) for the consumer markets. Office stuff doesn't require much CPU wise and is more concerned about storage speed and memory size. So gaming is your only real demanding application on home computing which is why you see all the enthusiasts focus exclusively on that. Making interactive games that utilize parallel logic is very difficult because everything is waiting on what you, the player, do. That being said, it can be done and has been done. It requires going back to the drawing board and redefining the problem in such a way that it can be done in parallel. This is neither cheap nor easy, so don't exact huge leaps all of a sudden. Over time various libraries and methods will be created that make this whole process standard.

Cazalan · Aug 7, 2014

palladin9479 :

I thought 8P systems were on the way down but saw this Oracle announcement recently. Big databases love lots of cores. At first I thought it was a new Sparc system but turns out it's Xeon.

120 cores/ 240 thread - 8-way, 6 terabytes (TB) of DRAM

http://www.oracle.com/us/corporate/press/2244663

palladin9479 · Aug 7, 2014

Cazalan :

Microservers my friend. Small units that each contain one or two sockets with 64 to 256GB of memory. You bundle ten twelve or more of them per chasis and you can stack three to four chasis's per rack, and well lots and lots of racks. You can then cluster them together to form a virtual datacenter and run hundreds of virtual servers inside them. This allows you to centrally control hardware resources and shift them to whichever systems need them. This is why extremely wide systems yet cheap systems are being favored, you can easily expand your capacity without incurring a sh!t ton of overhead costs.

SPARC / Power are used for very specific scenarios requiring an insane amount of I/O. SPARC and Power are both more expensive per unit of performance then x86, assuming your applications can run on either. What the other two RISC uarchs have is their ability to expand to insane levels of parallel I/O inside a single system which favors business level HPC design's (financial modeling / reporting and simulations).

Oracle M6-32

It's 32 M6 CPU's bolted into a single system. Each CPU has twelve cores with eight threads each and can access 1TB of system memory. Total system capacity is 384 cores, 3072 threads and 32TB of system memory in a single rack. It's price is astronomical and that's before the licensing and software / support costs. It's a specialty system for a niche market.

:Edit

I'm adding this rather then making a new post. One of the primary differences between x86 and Power / SPARC is how cache coherency is handled. Both popular x86 manufacturers use snooping based cache coherency protocols, essentially every request needs to be advertised to all interconnected chips to maintain coherency in their cache's. The positive to this is that it's very fast and time efficient, the down side is scalability is severely limited. Two sockets broadcasting to each other isn't a big deal, four sockets broadcasting to each other is manageable with enough interconnect bandwidth, eight sockets and suddenly everything starts getting slow as interconnect bandwidth is saturated or you spend a ridiculous amount on interconnects. SPARC and Power use a directory based protocol where each chip is responsible for a region of memory and they keep a directory of what each other does. The upside is that you can expand near infinitely as each request is point to point instead of broadcasts, the downside is that asking another chip about it's cache adds latency to the whole process. AMD use's a snooping protocol just like Intel and thus has the same four socket practical / economical limit.

cdrkf · Aug 8, 2014

palladin9479 :

Cazalan :

Microservers my friend. Small units that each contain one or two sockets with 64 to 256GB of memory. You bundle ten twelve or more of them per chasis and you can stack three to four chasis's per rack, and well lots and lots of racks. You can then cluster them together to form a virtual datacenter and run hundreds of virtual servers inside them. This allows you to centrally control hardware resources and shift them to whichever systems need them. This is why extremely wide systems yet cheap systems are being favored, you can easily expand your capacity without incurring a sh!t ton of overhead costs.

SPARC / Power are used for very specific scenarios requiring an insane amount of I/O. SPARC and Power are both more expensive per unit of performance then x86, assuming your applications can run on either. What the other two RISC uarchs have is their ability to expand to insane levels of parallel I/O inside a single system which favors business level HPC design's (financial modeling / reporting and simulations).

Oracle M6-32

It's 32 M6 CPU's bolted into a single system. Each CPU has twelve cores with eight threads each and can access 1TB of system memory. Total system capacity is 384 cores, 3072 threads and 32TB of system memory in a single rack. It's price is astronomical and that's before the licensing and software / support costs. It's a specialty system for a niche market.

:Edit

I'm adding this rather then making a new post. One of the primary differences between x86 and Power / SPARC is how cache coherency is handled. Both popular x86 manufacturers use snooping based cache coherency protocols, essentially every request needs to be advertised to all interconnected chips to maintain coherency in their cache's. The positive to this is that it's very fast and time efficient, the down side is scalability is severely limited. Two sockets broadcasting to each other isn't a big deal, four sockets broadcasting to each other is manageable with enough interconnect bandwidth, eight sockets and suddenly everything starts getting slow as interconnect bandwidth is saturated or you spend a ridiculous amount on interconnects. SPARC and Power use a directory based protocol where each chip is responsible for a region of memory and they keep a directory of what each other does. The upside is that you can expand near infinitely as each request is point to point instead of broadcasts, the downside is that asking another chip about it's cache adds latency to the whole process. AMD use's a snooping protocol just like Intel and thus has the same four socket practical / economical limit.

Which is where their very high core count (but small core) processors (based on either x86 or ARM) come into their own I guess- if you can't put more physical processors in a machine, put more cores on each processor instead...

AMD CPU speculation... and expert conjecture

Splendid

Distinguished

Glorious

Splendid

Honorable

Splendid

Honorable

Honorable

Judicious

Honorable

Honorable

Glorious

Honorable

Splendid

Distinguished

Glorious

Glorious

Distinguished

Honorable

Honorable

Glorious

Splendid

Distinguished

Splendid

Judicious

Share this page