AMD Reveals High-Performance Zen CPU, GPUs With HBM, And FinFET

maktovic · May 8, 2015

2016 is too late

By that time Intel's Skylake CPU's will be out and performance gap between AMD's Zen CPU's and Skylake CPU's will approximately be same as between Haswell and FX series CPU's

Dylan Richards · May 8, 2015

I'm all excited but...a 40% performance increase isn't that much, is it, people?
I mean...that's 40% better performing than...FX-8150...which would translate into 25% higher performance than the FX-8350.

How wrong am I?

Quite wrong. Firstly, it's not a 40% straight performance increase, it's a 40% instructions per clock cycle increase (don't confuse this with floating point operations per second). Secondly, this 40% increase is against the newest and last iteration of the Bulldozer core, Excavator. This means that it's around 75% IPC increase over Piledriver (the FX-8350, for example), and 85% IPC increase over Bulldozer (the FX-8150).

This is per core, it doesn't include floating point operations per second improvements, the high-bandwidth memory, or the fact that it's now using the 14-nanometer FinFET node, which is a huge leap from the 32-nanometer of Piledriver.

Comparing Piledriver to Zen, Zen features four times as much bandwidth in the floating point unit because one core gets two 256-bit FMAC units, instead of a single 128-bit unit found inside Piledriver. They can fuse to accept 512-bit wide instructions (AVX-512, etc.). Each core also gets its own level 1 and level 2 cache.
http://cpugrade.com/a/i/amd-bulldozer-fx-8150.png
http://cpugrade.com/a/i/amd-piledriver-fx-8370.png
http://cpugrade.com/a/i/amd-steamroller-a10-7850k.png
http://cpugrade.com/a/i/amd-jaguar-ps4-xbo.png
http://cpugrade.com/a/i/amd-zen-octa-core.png

Factoring in a 75% IPC improvement over Piledriver, a quad-core theoretically gains 300% IPC improvement over a Piledriver quad-core (e.g. FX-4350), and 600% IPC improvement over an octa-core (e.g. FX-9590). Zen isn't to be taken lightly.

MyDocuments · May 8, 2015

Well, I sincereley hope it packs enough punch to reduce the Intel prices, then and only then can I make a real "value-informed" choice(!) for my next PC build.

Aspiring techie · May 8, 2015

maktovic :

If Intel's track record is of any consideration, Skylake won't be much of an improvement over Broadwell or Haswell.

jimmysmitty · May 8, 2015

Dylan Richards :

I take Zen lightly. Only because every other time AMD has dropped anything after the Athlon 64 series it hasn't quite been a performer. Then again when you fail to deliver multiple times people tend to not hold their breath.

I highly doubt we will see 75% over PD. That is a massive jump and I don't even think Prescott to Conroe was that big even.

That said, paper always makes things look great. Hell Intel has paper that makes Haswell out to be a much better performer than Ivy Bridge. Real world is where it counts.

Aspiring techie :

In gaming, probably not. Then again except for AMD, gaming hasn't been held back by CPUs much, high end that is. Sure a lower end dual core benefits but quad cores and higher don't much.

However, in a lot of other areas Haswell is a much better performer than Ivy Bridge. Just not things normally used by our kind.

Of course we will have to wait and see but I think Skylake will be a better improvement over Haswell than Haswell was over Ivy Bridge.

sonnyrobbins101 · May 9, 2015

I've been building machines for only the past five years, but I can boast that I've never built an Intel machine. AMD produces solid CPUs and GPUs. I think their biggest downfall is marketing. Gaming = fun. Color, sound and imagery in marketing makes a difference! If AMD would start to market like Intel and Nvidea, I think their market share would climb significantly.

The FX 8-core I have in my CAD station just destroys simultaneous instances of Civil 3D, hydraulics and hydrology software, google chrome, music/streaming movies in the background and gaming when waiting for work to finish rendering.

AMD Radeon memory is also really sweet with RAMDisk!

I have only run into one issue and it was just last week. The FirePro W7000 graphics card that I was using burned out. Not sure why. If AMD customer support is as good as the other portions of my builds, this will be a lifelong customer!

dgingeri · May 11, 2015

Looking back, AMD seems to be pulling a Microsoft in some ways.

Their 486 was competitive with Intel, but the K5 was horrible competition for the Pentium.
The K6 turned things around, but K6-2 and K6-3 weren't so great compared to Intel's PII.
Then they came out with the Athlon, and were competitive again against the P3, but the Athlon XP wasn't so competitive against the P4 2.0GHz and higher.
Then the Athlon 64 was quite competitive with the P4-3.2GHz and higher, and the later versions weren't so much against the Core 2.

It's almost like ever other generation, they catch up, and then Intel responds with a faster chip that leaves AMD in the dust. Maybe they might have something here. It could be their "catch up" time. They just need this to make some profit and survive a bit longer. Granted, they haven't ever been as behind as they are right now. It would be some feat to get out of this hole.

Mike D7 · Jun 2, 2015

If it crunches numbers as fast as they claim on benchmarks performance scores, I'm going to get one always a fun experience building a new system.

chrissy4605 · Jun 2, 2015

I think I see more possibilities in the new Zen chips rather highly over the current batch of CPUs. I currently have the AM3+ FX8350. a relatively decent chip. But if they even come close to where intel currently treads then More power to AMD!

Naomi Kitsune · Jun 8, 2015

I really hope these destroy intel's next line of cpu's, I'd like to see AMD make a huge return. If AMD can get back up to something like a 40-50% market share, it would put more competition into cpu making and we could see our cpu power making larger leaps every generation

akamateau · Jun 8, 2015

AMD is planning on replacing system memory with stacked HBM. Here are the Patents. They are all published last year and this year with the same inventor; Gabriel H. Loh and the assignee is of course AMD.

Stacked memory device with metadata management
WO 2014025676 A1

"Memory bandwidth and latency are significant performance bottlenecks in many processing systems. These performance factors may be improved to a degree through the use of stacked, or three-dimensional (3D), memory, which provides increased bandwidth and reduced intra-device latency through the use of through-silicon vias (TSVs) to interconnect multiple stacked layers of memory. However, system memory and other large-scale memory typically are implemented as separate from the other components of the system. A system implementing 3D stacked memory therefore can continue to be bandwidth-limited due to the bandwidth of the interconnect connecting the 3D stacked memory to the other components and latency-limited due to the propagation delay of the signaling traversing the relatively-long interconnect and the handshaking process needed to conduct such signaling. The inter-device bandwidth and inter-device latency have a particular impact on processing
efficiency and power consumption of the system when a performed task requires multiple accesses to the 3D stacked memory as each access requires a back-and-forth communication between the 3D stacked memory and thus the inter-device bandwidth and latency penalties are incurred twice for each access."

Interposer having embedded memory controller circuitry
US 20140089609 A1

" For high-performance computing systems, it is desirable for the processor and memory modules to be located within close proximity for faster communication (high bandwidth). Packaging chips in closer proximity not only improves performance, but can also reduce the energy expended when communicating between the processor and memory. It would be desirable to utilize the large amount of "empty" silicon that is available in an interposer. "

Die-stacked memory device with reconfigurable logic
US 8922243 B2

"Memory system performance enhancements conventionally are implemented in hard-coded silicon in system components separate from the memory, such as in processor dies and chipset dies. This hard-coded approach limits system flexibility as the implementation of additional or different memory performance features requires redesigning the logic, which design costs and production costs, as well as limits the broad mass-market appeal of the resulting component. Some system designers attempt to introduce flexibility into processing systems by incorporating a separate reconfigurable chip (e.g., a commercially-available FPGA) in the system design. However, this approach increases the cost, complexity, and size of the system as the system-level design must accommodate for the additional chip. Moreover, this approach relies on the board-level or system-level links to the memory, and thus the separate reconfigurable chip's access to the memory may be limited by the
bandwidth available on these links."

Hybrid cache
US 20140181387 A1

"Die-stacking technology enables multiple layers of Dynamic Random Access Memory (DRAM) to be integrated with single or multicore processors. Die-stacking technologies provide a way to tightly integrate multiple disparate silicon die with high-bandwidth, low-latency interconnects. The implementation could involve vertical stacking as illustrated in FIG. 1A, in which a plurality of DRAM layers 100 are stacked above a multicore processor 102. Alternately, as illustrated in FIG. 1B, a horizontal stacking of the DRAM 100 and the processor 102 can be achieved on an interposer 104. In either case the processor 102 (or each core thereof) is provided with a high bandwidth, low-latency path to the stacked memory 100.

Computer systems typically include a processing unit, a main memory and one or more cache memories. A cache memory is a high-speed memory that acts as a buffer between the processor and the main memory. Although smaller than the main memory, the cache memory typically has appreciably faster access time than the main memory. Memory subsystem performance can be increased by storing the most commonly used data in smaller but faster cache memories."

Partitionable data bus
US 20150026511 A1

"Die-stacked memory devices can be combined with one or more processing units (e.g., Central Processing Units (CPUs), Graphics Processing Units (GPUs), and Accelerated Processing Units (APUs)) in the same electronics package. A characteristic of this type of package is that it can include, for example, over 1000 data connections (e.g., pins) between the one or more processing units and the die-stacked memory device. This high number of data connections is significantly greater than data connections associated with off-chip memory devices, which typically have 32 or 64 data connections."

Non-uniform memory-aware cache management
US 20120311269 A1

"Computer systems may include different instances and/or kinds of main memory storage with different performance characteristics. For example, a given microprocessor may be able to access memory that is integrated directly on top of the processor (e.g., 3D stacked memory integration), interposer-based integrated memory, multi-chip module (MCM) memory, conventional main memory on a motherboard, and/or other types of memory. In different systems, such system memories may be connected directly to a processing chip, associated with other chips in a multi-socket system, and/or coupled to the processor in other configurations.

Because different memories may be implemented with different technologies and/or in different places in the system, a given processor may experience different performance characteristics (e.g., latency, bandwidth, power consumption, etc.) when accessing different memories. For example, a processor may be able to access a portion of memory that is integrated onto that processor using stacked dynamic random access memory (DRAM) technology with less latency and/or more bandwidth than it may a different portion of memory that is located off-chip (e.g., on the motherboard). As used herein, a performance characteristic refers to any observable performance measure of executing a memory access operation."

“NoC Architectures for Silicon Interposer Systems Why pay for more wires when you can get them (from your interposer) for free?” Natalie Enright Jerger, Ajaykumar Kannan, Zimo Li Edward S. Rogers Department of Electrical and Computer Engineering University of Toronto Gabriel H. Loh AMD Research Advanced Micro Devices, Inc”
http://www.eecg.toronto.edu/~enright/micro14-interposer.pdf

“3D-Stacked Memory Architectures for Multi-Core Processors” Gabriel H. Loh Georgia Institute of Technology College of Computing”
http://ag-rs-www.informatik.uni-kl.de/publications/data/Loh08.pdf

“Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches” Gabriel H. Loh⋆ Mark D. Hill†⋆ ⋆ AMD Research † Department of Computer Sciences Advanced Micro Devices, Inc. University of Wisconsin – Madison”
http://research.cs.wisc.edu/multifacet/papers/micro11_missmap.pdf

All of this adds up to HBM being placed on-die as a replacement of or maybe supplement to system memory.

Replacing system memory with on-die HBM has the same benefits for the performance and energy demand of the system as it has for GPU's. Also it makes for smaller motherboards, no memory sockets and no memory packaging.

dgingeri · Jun 8, 2015

Naomi Kitsune :

I seriously doubt AMD will be able to pull a 40-50% market share for decades. They couldn't pull a 20% market share with a superior chip than Intel, twice, the original Athlon series or the Athlon64 series. They just didn't have the "in" with manufacturers that Intel does, and their marketing is extremely limited, giving them far lower brand recognition among average computer buyers.

Simply put, it's not us they need to impress. It's the masses and the manufacturers they need to impress. That will take a while, and a lot of marketing.

akamateau · Jun 8, 2015

dgingeri :

Actually DX12 gives AMD the best possible chance to grab market share. AMD APU's and dGPU's outperform all Intel and nVidia graphics silicon across the board with DX12.

DX11 was such a poor graphics API it virtually kneecapped the GPU.

dgingeri · Jun 8, 2015

akamateau :

As we learned from the Athlon and Athlon 64 series, better performance doesn't matter near as much as marketing. The masses don't buy something unless they're told to. Manufacturers don't make something unless the masses will buy it. HP, Dell, and Lenovo won't make AMD based systems unless they have someone who will buy a lot of them. In the Athlon 64 days, HP did make some business systems based on AMD's processors, and the company I worked for bought them, but since they didn't gain enough market share, they ended the line.

As for discrete GPUs, they're in a quickly falling minority. We gamers like them, but most people don't even look for them. They buy crappy iGPUs because the systems are cheap. Ever search for a laptop with a discrete GPU? They're in the top 10% most costly laptops. The cheaper 90% are all integrated graphics. Again, the masses control the market.

All AMD needs to do to gain market share is to advertise more aggressively. They can have crap products, and advertising would gain more sales than what they have. Intel spends almost as much on advertising as they do on engineering, and it pays off for them. When was the last time you saw an AMD commercial on TV?

jimmysmitty · Jun 8, 2015

akamateau :

Where is the information supporting this? We haven't seen any DX12 benchmarks in the wild yet.

DX11 wasn't a poorly written API, rather it introduced something that for the first time in years actually tested GPUs (Tesselation). And that is a good thing. If a GPU can just easily take anything on, what reason does the manufactures have to improve anything?

Funny thing too, when Tesselation first came out, AMD supported it first and performed better with it on than NVidia for a while. Not sure that is a kneecapping.

dgingeri · Jun 8, 2015

jimmysmitty :

DX11 has a LOT of overhead, and much of it is done on the CPU instead of the GPU, plus what processing of this overhead that is on the GPU is only single threaded, so it doesn't use the GPU effectively.

DX12 reduces the overhead, pushes more of this overhead onto the GPU, and organizes the overhead much more effectively. It is far superior.

However, I'm not sure I'd label DX11 as bad, necessarily. It's just that DX12 is quite superior.

jimmysmitty · Jun 8, 2015

dgingeri :

Of course it is. DX11 was superior to 10/9 in many ways. That's how we want it to go, progression not regression.

A lot of that overhead comes from tessellation. I just hope that people realize that a lot of the performance increases wont be on their high end system but, like with Mantle, on much lower end systems that do have a CPU that could present a bottleneck. There will be features that will allow the developers to better utilize the features of the GPU and make games that look better while keeping performance but it wont be the massive boost everyone is expecting.

Hype for things like this are bad. Mantle had it and unless you had lower than an i5 quad core it was not really anything special.

akamateau · Jun 8, 2015

jimmysmitty :

You could start by readin AnandTech from last February. They reported 3dMark API Overhead tests as well as Starswarm tests comparing DX11, Mantle and DX12.

They ran a later piece maybe in late April or May that was mainly 3dMark with AMD APU's vs Intel IGP. AMD APU's outperformed all INtel IGP i3, i5 and i7 by 100% in API Overhead. Largely because of AMD IP Asynchronous Shader Pipelines.

Also Radeon GPU on Intel platforms outperforms its nVidia counterpart quite nicely.

In short DX12 is the best friend that AMD has. Finally an API that enables full use of ALL CPU cores regardless of the game programming.

My claim of DX11 kneecapping applies to the entire dGPU APU and IGP industry. It is quite simply handicapped. DX12 allows for 400% increase in performance.

If you don't believe me the Google Brad Wardell and DX12 and read for yourself. Wardell has impeccable credentials and is a committed DX12 and early Mantle supporter.

You are also incorrect about how DX12 increases workflows. DX12 enables all cores and threads on the CPU to feed the GPU. It eliminates the CPU bottleneck. the game does not need to be written as a multithreaded game, the API manages CPU resources to USE multithreaded cpu assets to keep the CPU fed. This is Asynchronous Shader Pipelines and Asynchronous Compute Engines. Another AMD first.

Again if you don't believe me then Google and read Brad Wardell's comments. He is the CEO of StarDock.

DX12 will change gaming.

akamateau · Jun 8, 2015

dgingeri :

akamateau :

As we learned from the Athlon and Athlon 64 series, better performance doesn't matter near as much as marketing. The masses don't buy something unless they're told to. Manufacturers don't make something unless the masses will buy it. HP, Dell, and Lenovo won't make AMD based systems unless they have someone who will buy a lot of them. In the Athlon 64 days, HP did make some business systems based on AMD's processors, and the company I worked for bought them, but since they didn't gain enough market share, they ended the line.

As for discrete GPUs, they're in a quickly falling minority. We gamers like them, but most people don't even look for them. They buy crappy iGPUs because the systems are cheap. Ever search for a laptop with a discrete GPU? They're in the top 10% most costly laptops. The cheaper 90% are all integrated graphics. Again, the masses control the market.

All AMD needs to do to gain market share is to advertise more aggressively. They can have crap products, and advertising would gain more sales than what they have. Intel spends almost as much on advertising as they do on engineering, and it pays off for them. When was the last time you saw an AMD commercial on TV?

Not all of your response is correct. Lenovo uses discrete GPU in many of it's media, graphic and workstation laptops. I just purchased a Lenovo Edge E550 with an Intel i7-5500U and AMD R7 265M discrete gpu for $699.

DX12 will allow an APU to demonstrate 35-40 fps on HD DX12 games. This opens up many more titles to players previously using $$$$$ dGPU.

jimmysmitty · Jun 8, 2015

akamateau :

I am not going to use synthetic gaming benchmarks as proof of anything. That is the same as using the raw performance of a GPU as how good it will be, which it is not. Until I see real world gaming benchmarks I will reserve my judgment. Will DX12 be better than 11? Yes. Will it be some amazing godsend to make AMD GPUs skyrocket? No it will not. Mantle was proof. Removing the CPU bottleneck, what Mantle was supposed to do, only helped people with low end CPUs, pretty much anything lower than a i5 got an increase in performance. But if you have an i5/i7 it wont because the CPU is not presenting a massive bottleneck.

Kneecapping would refer to it handicapping GPUs which it did not. Before DX11 most games were using DX9 still which did not have features that were pushing GPUs. DX11 introduced features that could stress the GPUs again and now it no longer is. Go look at any Mantle comparison with a high end system. Mantle makes no difference. DX12 will allow for companies to utilize the GPU resources better and allow for better graphics will keeping decent performance.

You are hyping it up much like people are hyping up the Radeon Fury. Hype never ends well.

BTW, Stardock is crap. I never have nor will I ever use it. Just because one person says it is great does not mean it is. I have seen other bigger game devs say DX12 is not going to be super special.

Again, I think DX12 will be good, it will benefit PC gaming. I just don't think it is going to be some insane % increase on just AMD GPUs or any GPUs for that matter, and I still have my HD7970GHz so no I am not a hater I am just a realist.

de5_Roy · Jun 9, 2015

see, this is why we need a functional (and/or tag-based) downvote button in the news comments threads.

spdragoo · Jun 9, 2015

So....I guess someone's caps lock key got stuck. Must be time for a new keyboard...

jimmysmitty · Jun 9, 2015

de5_Roy :

Nah he can just take a vacation. No need for insults. It was an exchange of opinions yet people for some reason let it infuriate them. Reminds me of Baron from the Phenom days. He would get riled up easily like this.

AMD Reveals High-Performance Zen CPU, GPUs With HBM, And FinFET

Distinguished

Distinguished

Reputable

Reputable

Champion

Reputable

Distinguished

Reputable

Honorable

Reputable

Reputable

Distinguished

Reputable

Distinguished

Champion

Distinguished

Champion

Reputable

Reputable

Champion

Splendid

Splendid

Champion

Share this page