AMD CPU speculation... and expert conjecture

logainofhades · Jun 16, 2014

kviksand81 :

Knowing the EU, they will just pad their bank accounts with the money.

esrever · Jun 16, 2014

kviksand81 :

AMD isn't getting a penny. They weren't the ones suing intel.

jdwii · Jun 16, 2014

Doing more test's on my 8350fx and it seems like if i would of went with a 6 core fx or 4 core fx it would of bottlenecked my system to much in CPU heavy titles that use 4 cores. Actually going to test the FPS in BF4 and Farcry 3 with 4 cores vs all 8 i'm doing my testing with 1 core per module.

logainofhades · Jun 16, 2014

Yea, the modular approach that AMD attempted just wasn't a very good idea. Tweaking and improving Phenom II to handle the kind of clocks that FX does would have offered far better performance, IMO. The fact that they are scrapping the faildozer approach and going with a totally new arch proves this.

kviksand81 · Jun 16, 2014

logainofhades :

esrever :

Yeah, you're absolutely right! Hadn't seen noticed it previously, sorry.

anxiousinfusion · Jun 16, 2014

gamerk316 :

Which also applies to high end GPUs, which is also a no go for 90% of consumers.

esrever · Jun 16, 2014

logainofhades :

The modules were decent in what they tried to do, AMD is just unfortunate that the foundries aren't able to keep up with intel. They seem far more efficient in the mobile space than llano was.

Cazalan · Jun 16, 2014

More PRO APUs surface, including a 3.9 / 4.2 GHz model.

http://www.cpu-world.com/news_2014/2014061601_New_AMD_A4_PRO_series_processors_surfaced.html

szatkus · Jun 16, 2014

Cazalan :

Graphics: HD 8470D

Apparently it's rebranded Richland.

Cazalan · Jun 16, 2014

Now thats a low blow to PCs.

Watch Dogs was gimped.

http://wccftech.com/e3-2012-graphics-watch-dogs-deliberately-turned-off-pc-next-gen/

jdwii · Jun 16, 2014

esrever :

They are meant for servers first then "good enough" for everything else. But really is it a good design for servers? This design takes 1 module to tie with 1 Intel core with Ht and it needs a 20% clock increase to do so and both cores in the module to run at 100% which needs software. CMT also takes more hardware then HT which is well honestly almost free to add to the die space which means less cost and well less heat, although SMT is 30% efficient vs 90% for steamroller or worse with my CPU 80% and yes in some test's 70%. If Amd went with 4 ALU+ 4 AGU with a deditcated FPU per core with 256K of L1 cache and 512K of L2 cache per core and maybe a shared L3 cache they would of been set and then they could of added SMT as well. That would of worked but now at 4.3Ghz my PD can barely beat my 1100T 3.9ghz unless programs use some of the newer instruction sets that this thing has. So in total no i really don't think even steamroller is worth it for servers when it is at best 20% stronger then magny cores which came out over 4 years ago. That's sad given the fact that the 12 core needs 90watts and the new one needs more that's not what i call performance per watt which is the number 1 thing in servers.

jdwii · Jun 16, 2014

Cazalan :

I was reading that something seemed off when i was playing it and i saw the graphics and i was like WTF some things looked as bad as Godfather 2(the game)

juanrga · Jun 17, 2014

I know you agree. I was merely trying to add some more info to your excellent post.

What you say about the A57 is again correct. The A57 core has been designed to fit inside a phone. It is a phone-class core. Thus altough it already surpases jaguar/piledriver in IPC, it is clocked relatively low to maintain power consumption under control. By this reason the 8 A57 cores in AMD Seattle are rated at 25W; whereas 8 Haswell cores, on a better node, are rated at 140W. Evidently a 25W SoC will not beat a 140W CPU in raw performance.

The SoCs that will beat Intel Xeon and AMD Opterons are other. Those SoCs will use high performance cores (sometimes named server-class cores) and don't will be used in phones.

Cavium is designing ~90W SoCs that will surpase Intel top Xeons in performance. I estimate the performance of their ARM SoC on 960 GFLOP/s. The FX-8350 tops at 256 GFLOP/s. The fastest Xeon tops at 518 GFLOP/s.

AMD K12 is also designed for competing with Opteron/Xeons. Broadcom is designing custom ARM cores of high performance (4-wide with SMT4 level) for servers, and rumours point to Nvidia designing ARM cores to beat Opteron/Xeon CPUs as well. Nvidia project is named Boulder

http://www.techpowerup.com/172683/nvidia-to-take-on-xeon-and-opteron-with-a-boulder.html

We know that the reason why Intel is increasing the performance of Xeons, increasing the number of cores (15 cores or more), and lowering prices is because are prearing for the battle with the ARMy. 🙂

Intel is following Nvidia here. Phones are a too competitive market.

Several MIPS licenses have shifted from MIPS to ARM recently.

Audi and Lamborghini are already integrating ARM SoCs in their cars.

Replacing VRAM by system RAM is not a problem by itself. If the dGPU uses DDR3 and you replace that by DDR3, there is no bandwith problem. What you are reporting is the bandwith problem when a fast memory such as GDDR5 is replaced by slow DDR3 memory.

The PS4 APU uses GDDR5 memory instead slow DDR3.

This problem with slow DDR memory is only temporary. AMD, Nvidia, Intel, and ARM will replace slow DDR system memory by fast stacked system memory. The first will be Intel, who the next year will release a KL CPU with 8/16GB of stacked RAM with a bandwith of 500GB/s. For the sake of comparison both the Nvidia 780Ti and the Titan Black have GDDR5 memory with only 336GB/s.

Next will be AMD with HBM stacked RAM

AMD-Volcanic-Islands-2.0-HBM-Memory-635x357.jpeg

and finally Nvidia with HMC stacked RAM.

That is a Richland model.

juanrga · Jun 17, 2014

The CMT architecture is odd. If it was good then everyone in the industry would be copying it, which is not happening. Steamroller with its double decoder is a step backward from the original CMT design to the traditional CMP design.

AMD already hired the entire team (both engineers and managers) responsible for the CMT design. AMD has already admitted in public that the Bulldozer family was a "fiasco". AMD avoided CMT for the successful jaguar core, and AMD is abandoning CMT for the new post-excavator architecture.

CMT is the reason why AMD lose the server market and now owns ~3% of share. The only reason why CMT had some presence in desktops is because (i) power consumption is irrelevant here and (ii) AMD reduced prices at expense of losing money and getting in the red numbers close to bankruptcy. Check last numbers, the CPU division is still losing money.

Please no more CMT was good enough.

cdrkf · Jun 17, 2014

juanrga :

CMT is one of a number of things that hurt AMD, but it isn't the sole culprit. I also think that the design philosophy (given what was known at the time) was sound, unfortunately (for AMD) Intel were able to achieve much better results using Hyperthreading than anyone really expected, matching a dual core CMT module with a single core + HT, but dodging the downsides of CMT. Had things played out the way AMD expected (as in considerably better MT scaling on CMT compared to HT), things might have been quite different.

The other things to bear in mind is the delays fab issues have caused. Bulldozer was never intended to go against Sandy, it was a generation late which is never going to make it look good. If you compare the FX 8150 against the first gen i7 the story is somewhat less black and white. Similarly the FX 8350 compares reasonably well with Sandy, but does look pretty dated when you start talking Haswell kit.

Don't get me wrong, I think we all agree that Intel's approach is now proven superior, however I can sympathise with why AMD thought bulldozer was the way to go at the time. On paper it makes allot of sense.

szatkus · Jun 17, 2014

cdrkf :

juanrga :

CMT is one of a number of things that hurt AMD, but it isn't the sole culprit. I also think that the design philosophy (given what was known at the time) was sound, unfortunately (for AMD) Intel were able to achieve much better results using Hyperthreading than anyone really expected, matching a dual core CMT module with a single core + HT, but dodging the downsides of CMT. Had things played out the way AMD expected (as in considerably better MT scaling on CMT compared to HT), things might have been quite different.

The other things to bear in mind is the delays fab issues have caused. Bulldozer was never intended to go against Sandy, it was a generation late which is never going to make it look good. If you compare the FX 8150 against the first gen i7 the story is somewhat less black and white. Similarly the FX 8350 compares reasonably well with Sandy, but does look pretty dated when you start talking Haswell kit.

Don't get me wrong, I think we all agree that Intel's approach is now proven superior, however I can sympathise with why AMD thought bulldozer was the way to go at the time. On paper it makes allot of sense.

Actually CMT works pretty good. Everybody thinks that Bulldozer's problem is CMT, but it's not. Bulldozer family has other problems and removing CMT won't help with this. I won't be suprised if they use modules in next-gen x86 core.

Just look at single threaded benchmarks where CMT doesn't matter. It's still much slower than Intel.

gamerk316 · Jun 17, 2014

anxiousinfusion :

But you see low end GPUs support cheaper forms of VRAM. You can get away with this because its independent of the host chipset, due to the GPU being external to the motherboard. On a CPU however, you'd not only have to include two separate memory controllers (itself a waste of die space and cash), you'd need to design two separate chipsets for each type of main memory.

gamerk316 · Jun 17, 2014

Well now:

http://www.iflscience.com/technology/new-type-computer-capable-calculating-640tbs-data-one-billionth-second-could

The result is a system six times more powerful than existing servers that requires eighty times less energy. According to HP, The Machine can manage 160 petabytes of data in a mere 250 nanoseconds. And, what’s more, this isn’t just for huge supercomputers- it could be used in smaller devices such as smartphones and laptops. During a keynote speech given at Discover, chief technology officer Martin Fink explained that if the technology was scaled down, smartphones could be fabricated with 100 terabytes of memory.

Gimme benchies.

cdrkf · Jun 17, 2014

gamerk316 :

I think that's the point of the APU however- they use a joint memory controller and a single shared memory pool. There is nothing that technically precludes CPU cores accessing gDDR5 memory (as proven by the PS4). I understand why this hasn't happened as gDDR5 memory requires very good quality interconnects and short wiring runs, it's only really suited to mounting directly to the PCB, it's also relatively expensive compared to DDR3.

cdrkf · Jun 17, 2014

gamerk316 :

Now that does sound very reminiscent of what another company has been trying to do... "In order to handle this flurry of information it uses clusters of specialized cores as opposed to a small number of generalized cores.
Read more at http://www.iflscience.com/technology/new-type-computer-capable-calculating-640tbs-data-one-billionth-second-could#PPRTDVjwstZb8FIJ.99"

gamerk316 · Jun 17, 2014

And not to rekindle the OG/DX debates:

http://www.phoronix.com/scan.php?page=article&item=amd_apitest_nvidia&num=1

Which goes back to why multiplatform OGL titles target OGL3, not OGL4. AMD doesn't have a functioning OGL4 driver on Linux yet.

Angeloid · Jun 17, 2014

gamerk316 :

Sounds really inefficient to me.

etayorius · Jun 17, 2014

gamerk316 :

Well now:

http://www.iflscience.com/technology/new-type-computer-capable-calculating-640tbs-data-one-billionth-second-could

The result is a system six times more powerful than existing servers that requires eighty times less energy. According to HP, The Machine can manage 160 petabytes of data in a mere 250 nanoseconds. And, what’s more, this isn’t just for huge supercomputers- it could be used in smaller devices such as smartphones and laptops. During a keynote speech given at Discover, chief technology officer Martin Fink explained that if the technology was scaled down, smartphones could be fabricated with 100 terabytes of memory.

Gimme benchies.

Can this be used for Gaming?

8350rocks · Jun 17, 2014

@gamerk:

Just FYI: I cannot provide additional information, however, M$ came to AMD about DX12 optimizations and their approach with MANTLE immediately after the MANTLE press conference last year. That was when DX12 was officially under way at M$. Hence why it is still 2 years-ish out, meanwhile MANTLE is here now... Given the sequence of events, it fits that MANTLE coming first would be accurate considering that DX12 is not scheduled to be launched until ~2016.

EDIT: On a side note, John Byrne is now officially the GM of compute @ AMD and reports directly to Lisa Su. I congratulated him on Twitter this morning

Rum · Jun 17, 2014

I am Surprised that no one posted this yet http://wccftech.com/ibm-hand-chip-business-global-foundries-game-changer/ IF (and that is a big IF) the "authentic" source is correct then it could bode well for AMD and future chips and Die shrinks actually being released on time.

AMD CPU speculation... and expert conjecture

Titan

Splendid

Splendid

Titan

Honorable

Distinguished

Splendid

Distinguished

Honorable

Distinguished

Splendid

Splendid

Distinguished

Distinguished

Judicious

Honorable

Glorious

Glorious

Judicious

Judicious

Glorious

Reputable

Honorable

Distinguished

Honorable

Share this page