AMD CPU speculation... and expert conjecture

Page 604 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

szatkus

Honorable
Jul 9, 2013
382
0
10,780


32?
http://i.imgur.com/qI0piu7.png

Find any real world application which consists of only AVX instructions. It's impossible besides some synthetic benchmarks.

Edit:
That's your 100% better FP in the best case I found:
http://openbenchmarking.org/embed.php?i=1308319-SO-GCC49SNAP71&sha=1593a32&p=2
 

jdwii

Splendid
Juan i ran countless benchmarks and concluded that Piledriver is around 40-50% weaker in single core per clock performance compared to haswell. I tested this while disabling turbo on both and clocking my 8350fx to 3.2ghz the same as my friends I5 locked haswell i5-4460. I used around 8-9 benchmarks that stress the CPU to 100%. Unless you are clearly stating that either A my testing is invalid or B the next gen design will be slower per clock per core compared to PD or you are stating something even more funny Intel will make a CPU 30-50% faster in 1 gen something we haven't seen since the 90's. Or you are a TROLL and are once again cherry picking results that mean nothing to anyone.
 


Any reason why you completely ignored my post about being civil toward other users ... ??

 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Just read last 100 pages. I think it's a good reason...
 

8350rocks

Distinguished
@Reynod:

I was trying to be civil. It may come off as harsh, but considering the BS we have been putting up with for 50+ pages...it is time to call a spade what it is, a spade.

Some users here have been making wild claims about things that fly in the face of engineering thought and the laws of physics. They have not produced any verifiable proof of such claims and only show marketing slides continually pointing to them as if they were the word of law. I actually do know people at AMD and what I am hearing contradicts much of what he says. I would take their word over his any day. Unfortunately I cannot say more about what is coming, however, I can say that they are not pulling any punches. HEDT is very important to them for the Halo effect.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


He's only discussing theoretical peaks. Reminds me of VLIW a bit. Peak numbers and theoretical numbers derived from mathematical formula love to get thrown around by AMD, Nvidia, and Intel.

But you are right, that will not translate into a raw increase in real world performance, much like adding entire units to Haswell increased theoretical output greatly yet only yielded a small IPC increase.

Diminishing returns for throwing transistors and improving single thread performance are out of hand. It's not worth it on x86 anymore once you reach Intel levels of single thread performance.

The only way to go for x86 is MOAR COARS and software is going to have problems going beyond what we have now for a while. Whoever can come up with a good way to use those cores in a way we don't use them now is going to have a field day.
 
HP Stream 14 inch notebook coming soon for $199
http://liliputing.com/2014/08/hp-stream-14-inch-notebook-coming-soon-199.html
mullins is here. according to the article this is only the second mullins based laptop. the price looks good for an entry level windows laptop considering a windows 8 license costs about $90-100. this means that the usual cost cutting will ensue...

ASUS ROG's first AMD FM2+ motherboard ships this month
http://hexus.net/tech/news/mainboard/73413-asus-rogs-first-amd-fm2-motherboard-ships-month/
in u.k.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


32 is for double precision. I was giving single precision numbers. But it is irrelevant what precision you chose, to caugth Intel AMD would increase the width of its vector units by 8x.

Single Precision: Piledriver/Steamroller (8 FLOP/core); Skylake (64 FLOP/core); Intel-AMD gap (8x)
Double Precision: Piledriver/Steamroller (4 FLOP/core); Skylake (32 FLOP/core); Intel-AMD gap (8x)

My "100%" wasn't for AVX code neither for FP code. I explained in a previous post how I got the 100% and for what. Other people is getting a similar number.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


What is really noticeable is that even AMD admits that the "moar cores" era has finished

hsa_history_large.jpg


We will see 8-core APUs and some 12/16 core CPUs for servers and similar stuff, but the moar cores era is achieving the performance plateau and future designs will be pivoting around HSA.
 

jdwii

Splendid
At least juan admits its a completly useless measure of performance that he uses. If everything mattered only in flops a GPU from 2006 would be superior in everything compared to a I7 in reality that isn't true.
 

griptwister

Distinguished
Oct 7, 2012
1,437
0
19,460
More Cores on x86 I believe will always only be for the enthusiast and workstation crowd. But for basic public, I think there'll soon be 8 core minimum. As single thread continues to improve, There will be more cores added until the CPU can be completely replaced. It's an on going trend that we see... the doubling of transistors, the doubling of memory, the doubling of cores. It's a cycle in the PC world.

The fallacy of Juans argument is that he's claiming dGPU's and "More Cores" are going extinct. But reality is proving that Juan's logic wrong. I'm not saying dGPUs and dCPUs will be around forever, But I don't think APUs will ever make the high end. Neither will ARM. ARM isn't seeing better numbers than x86 on the highend market...

After you hit the wall on single thread performance the only way is more cores... If x86 is hitting a wall. ARM will do the same eventually... and if the performance isn't as good as x86, what's the point of having an ARM chip in a desktop PC? It makes no sense.

I'm not the most educated person in this thread. But it's common sense to see that Juan is wrong. I'm not making a personal attack. It's just really annoying to see someone so hell bent on saying they're right. He's the hipster of the PC world. Haha, He believed in ARM before it was cool. There's no doubt in my mind that HSA and ARM will take off... Just not to the extent that Juan is claiming. Especially with all these new Technologies on the rise.

This is my own opinion.
 

jdwii

Splendid


I agree 100% with you except with those 2 bolded statements, the doubling of cores can't continue forever over fabrication issues also 8 cores minimum might be true on smart phones and tablets but i doubt it will be for laptops or desktops for a long time if ever unless the cores are tiny. Amd might be going for high core counts but intel doesn't need to. I'm hoping for a day when Intel makes quad cores become the new I3 and I5's become the new Quad with HT and the I7 is a 6 core and the I7 extreme is a 8 core.

I think the issue mainly with juan is that he always looks for theoretical performance even more so with flops, again if software isn't programmed to take advantage of it who cares? We all know that type of computing well be done on the GPU more efficiently with the CPU doing mainly integer as it does today anyways.
 


Yes and no. Taking a serial algorithm and trying to make it work in parallel will hit a wall rather quickly. Instead you redo the entire algorithm to assume parallel work from the beginning. Otherwise there is no such thing as a "serial task", only a serial implementation of a task. Any task that is done more then once can be done in parallel, but only if the structure of the implementation assumes the work is divided up and acts accordingly. IE you don't try to have two worker threads working on the same data pile, instead you divide up the data pile into two separate segments and have the worker threads work independently of each other. Also rarely is there a situation where only a single task needs to be preformed, often there are many tasks the the original programmers set them up in serial because it's human nature to build that way. Everybody else just kept it going because it was easier / cheaper to do so rather then starting from scratch and redoing the methods to not work in lock step.

This is a foregone conclusion, it will happen and is ultimately only a matter of time and money. With each successive year we see more and more software being written to work in SMT environments. The only two area's left are the main logic loop in games and the older rendering pipeline in API implementations. And both of those are evolving as we speak.
 

genz

Distinguished


My opinion on that (which may be wrong) is that software must follow hardware, and whilst the software curve for multiprocessing is not able to fully take advantage of multiporcessing, that doesn't stop the hardware world from not planning to move on to the next thing. HSA isn't actually here yet, and definitely not at the stage where it is popular enough to start being the target platform of every developer (remember how long it took them to start going 64 bit en-masse) so it's expected for software to still be on the last platform, even if it's evident we haven't truly taken fully advantage of the multi-processing era on the consumer levels of computing.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810
Funny how the first person I saw use the code name Zen also said Zen was cancelled. So take anything you read here or other LinkedIn profiles with a big dose of salt. Note also that professionals with NDAs generally want to keep their integrity and wouldn't spill those details for the world to see just to gain some forum cred. And even those with NDA's signed doesn't mean all details are given. Information is only given on a need to know basis.
 
just when i thought this will be a slow amd-news month......

fx series price drops across the board, we have some idea which cpus and how much
http://hexus.net/tech/news/cpu/73365-amd-set-slash-fx-cpu-pricing-september-1/

A quick look at AMD's Radeon R7 SSD
Radeon solid-state drives are a thing now
http://techreport.com/review/26936/a-quick-look-at-amd-radeon-r7-ssd

JPR Reports AMD Jumps 11% in GPU Shipments in Q2, Intel up 4%, NVIDIA Slips
http://www.techpowerup.com/204282/jpr-reports-amd-jumps-11-in-gpu-shipments-in-q2-intel-up-4-nvidia-slips.html

edit:
looks like athlon 860k(sr-b core) will show up soon, shopblt has it on preorder list ~$95
http://www.cpu-world.com/news_2014/2014081801_AMD_Athlon_X4_860K_processor_available_for_pre-order.html

PS4 and Xbox One Pack 10 Times More Powerful and Balanced Hardware Compared to the Last-Gen Consoles, Says Geomerics
http://wccftech.com/ps4-and-xbox-one-gpu-cpu-8gb-balanced/

@zen: what if instesad of getting cancelled, zen is getting a do-over? codenames can switch to new parts (e.g. richland). cencellation will turn into delayed launch. it might favor carrizo desktop apus.
i wonder if amd will make an apu with 8 zen cores (~3GHz+) + 512-1024 gcn 2.0 cores with 128MB-1GB stacked hbm. mmm...

edit2:
AMD A10-7800 APU Review: Kaveri Hits the Efficiency Sweet Spot
http://www.tomshardware.com/reviews/amd-a10-7800-kaveri-apu-efficiency,3899.html

edit3:
CPUFreq Scaling Tests With AMD's Kaveri On Linux 3.16
http://www.phoronix.com/scan.php?page=article&item=amd_kaveri_cpufreq&num=1
Trying The Configurable 45 Watt TDP With AMD's A10-7800 / A6-7400K
http://www.phoronix.com/scan.php?page=article&item=amd_a_45watt&num=1
i love power consumption measurements. :)
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I agree on your comments about software. If software didn't follow hardware we would be still working with 8bits and 16KB programs for instance.

However the true advantage of approaches such as HSA is that they break the limits of former approaches. The slide given just above about the eras of processing shows this.

http://www.tomshardware.co.uk/forum/352312-28-steamroller-speculation-expert-conjecture/page-307#13986163

The GHz era finished due to hardware limits and now the "moar cores" era is finishing due to hardware limits. Only new approaches such as HSA can restart the performance race again.



When PS4 was presented, AMD described something very close to HSA. You can find Robinson words here

http://www.techradar.com/news/gaming/consoles/amd-on-the-ps4-we-gave-it-the-hardware-nvidia-couldn-t-1141607

"For us, really by looking at that APU that we designed, you can't pull out individual components off it and hold it up and say, 'Yeah, this compares to X or Y.'

"It's that integration of the two, and especially with the amount of shared memory [8GB of GDDR5, 176GB/s raw memory bandwidth] that Sony has chosen to put on that machine, then you're going to be able to do so much more moving and sharing that data that you can address by both sides.

"It's more than just a CPU doing all these amazing calculations and a GPU doing calculations. We are now going to be able to move certain tasks between the two."

Devs, he said, will be able to push the console's capabilities beyond a traditional x86 PC architecture, and multithreading - being able to take advantage of all eight cores - is going "to become a huge deal for a lot of the big blockbuster games."

This is a description of HSA-like advantages, specially the part where says that the PS4 APU is more than a CPU and a GPU. I explained before in this thread that under HSA, the iCPU and the iGPU can work cooperatively on same data structures at once outperforming an outdated CPU+GPU arch.

Some time after the presentation of the PS4, an AMD representative mentioned that PS4 was better than Xbox1 thanks to having some HSA features like HUMA, but this was officially rejected by another representative. Evidently the PS4 is better and I think that AMD retracted the comments against the Xbox1 because Microsoft is an important customer.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


By Intel logic PD/SR are 8/16 FLOP/core/clock (because of FMA).
Ok, ok, now tell me where do you see a problem? Implementing AVX512 isn't difficult. Probably newarch will include that.
In general muliplying resources isn't hard. AMD's main problem now is feeding them.
 


Fine ... call a spade a spade and come off harsh again, or be cheeky like szatkus and I'll two for one the pair of you for 14 days right off the cuff. Be civil or enjoy the break.

We expect users to focus on the topic ... not make personal attacks.


 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. PD/SR include two 128bit FMAC units per module. This accounts to 128bit per core. This implies a maximum of 4 SP ops per core. Using SSE this gives 4 FLOP/core using FMA this gives 8 FLOP/core. Thus for Kaveri the maximum throughput is, when using FMA instructions,

4 core x 8 FLOP/core x 3.7GHz = 118.4 GFLOP/s

maximum-gflops.jpg


Haswell has two 256bit units per core and FMA support was added on AVX2. This implies 16 SP ops per core and, that Haswell peaks at 32 FLOP/core, when using FMA-AVX instructions,

AVX2-640x152.png


Thus for i7-4770k CPU

4 core x 32 FLOP/core x 3.5GHz = 448 GFLOP/s

You can find this 448 number mentioned in many places e.g. next

http://www.hardware.fr/news/13206/kaveri-radeon-7750-integree.html

For double precision divide all those single precision numbers by two.

Broadwell maintains the 32 FLOP/core but Skylake will increase it to 64 FLOP/core achieving SIMD-wide parity with the Xeon Phi Knight Landing, which also does 64 FLOP/core.

The FP unit is the most weak part of AMD architecture (as was shown in my BSN article) and there is no way that AMD can increase the total throughput by a factor of 8x to achieve parity with Skylake, except when one believes on miracles!

Moreover, there are several versions of AVX512. AMD would implement all them and the companion ISAs to achieve parity with Skylake Xeons. This is not happening.

I expect AMD to increase the floating point performance of the K12 core to 16 FLOP/core. I expect the K12 core to have two 128bit units. This is an improvement factor of 2x over Piledriver/Steamroller FPU (i.e. 2x more transistors).
 

sapperastro

Honorable
Jan 28, 2014
191
0
10,710
Hmm, if the consoles (or at least one) has HSA, then this is a good thing for AMD in the long run. I am still waiting for mainstream programmers and game coders to begin taking advantage of HSA though. I wonder how long it will take...

Floating Point: Can anyone here outline the major uses of the floating point unit? I was under the impression that, compared next to the integer units, the floating point unit is a little used component in many applications.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


It isn't a miracle. Implementing AVX512 is quite simple. Excavator has AVX2, so they have a good start. Transistors count and size for this also doesn't look too bad at 14FF.

I don't see where is a problem. It isn't first time when AMD implements some new instructions.

I would rather worry about TSX. This thing is quite tricky.
 
Status
Not open for further replies.