AMD CPU speculation... and expert conjecture

Page 683 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No. There is code that is not parallelizable. Single thread performance will remain a fundamental metric even in the many-core epoch.

20-core Zen FX chips for the desktop smell to Bulldozer 2.0 fiasco.



No. Broadwell improves single thread performance over Haswell and Skylake improves single thread performance over Broadwell. Skylake also doubles the SIMD throughput per core.



No. What happens is that Haswell single-thread IPC is about 60--70% higher than Piledriver. And people in the industry/journalism knows that obtaining a 10% gain on top of a already first-class design as Haswell is a much more difficult task than obtaining a 10% on top of something as Piledriver.
 

alcatrazsniper

Reputable
Dec 31, 2014
58
0
4,640
Guys, AMD can just release another processor with more cores. The new FX 8350 has about an 88 watt TDP. So that means they can just bump up their high end processors to maybe a decacore or dodecacore(lolwut?) processor with 20 or 24 threads you know. Then they can name it something fancy like crane or tractor. That'll show Intel with their little 4 core processors...
 

jdwii

Splendid
First juan is correct the amount of resources it would take to pull off a design competitive to Intel is too great, with recent layoffs and the amount of money Amd has to spend on research. I wonder if it would be too much for Amd to handle Nvidia with the Tegra 1 or snapdragon since Arm is arguably a simpler design and the other 2 have more experience with working with Arm even if we through in Jim keller working at Amd we still have the $$$ issue.

For others saying more execution units wouldn't improve performance over Piledriver I would like them to look at Intel's design since that seems like a flawed argument. Adding more resources is only the beginning other advancements need to be done as well for example more IPC per core (isn't it 2 per core instead of 3 with Phenom and doesn't Intel have 4?) an improved scheduler as well as cache speeds which severally needs improvement.

If these things are not addressed we will see Amd being pushed more and more towards the SOC market only with everything else a niche market. With their current GPU architecture (which was already discussed over and over again) being inefficient compared to Maxwell on the same node in terms of FPS/watt it doesn’t look good in the dgpu market in the long term. However for the short term we might be seeing Amd launching the 390X with HBM and 20nm using their current GCN design in the 285.

Also many like to argue GCN is a more efficient design in terms of compute power which would hint they are trying to grab customers in the super computer market or workstations were Nvidia easily owns way more share and simply has a stronger name. Plus companies hate to change unless they have a HUGE reason to.

K12-Zen-Future GCN products, that’s what Amd’s future is in the next 1+ years. Amd claims they won more designs in the SOC market. One of which is probably Nintendo (perhaps a new handheld or console) this could be another great win and also hints towards Amd’s future in the SOC market.

Amd needs to drop their CPU line and focus on APU only products for the mainstream or in other words SOC only. As juan pointed out future APU’s will have very little space consumed for the CPU(probably less than 15%) which means the future of dgpu’s seem dim.

We still have issues today however and it’s still hard for me to understand how we will handle HBM if it’s on top of 80C CPU/GPU cores. I’m sure they will figure it out. I’m even more sure Amd wants this over their obsession with APU products. Not only that but it changes the scope of things since Intel’s a joke in terms of igpu’s.

Amd is competing in to many markets it’s too hard for them to focus on 1 thing to greatly and right now just about everyone is their competition. Designs are getting more complicated to come up with, over it being harder to just put your design on a lower node which allowed them to have almost free performance with very little effort. This means each transistor in every architecture will have to be that much more efficient which will require better engineers which in return requires more $$$.

Amd has only one advantage over their competition they’re the only ones with a X86 lease and great graphics. Just not sure how well of an advantage that’s been for them so far with Llano-Trinity/Richland, Kaveri.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780

ALUs, not IPC.



Oh no, not again...
 


Yeah I've still got a 1055T for my workstation at work, lovely chip. Phenom II was a seriously underated chip imo (I guess it was tainted a bit by the problems with Phenom I).

It will be interesting to see what actually happens with Zen and K12. AMD aren't compeltely daft, however they also have to work with what they have. It's *got* to have a fairly decent single thread performance boost from the bulldozer derived cores, even if it's still behind Intel.

That is also another point though, right now single thread performance is *more important* than I think it should be these days. I agree there are some things that can't be made parallel, however I think more software can be made parallel then many realize. I mean graphics API's are a good example, DX11 and OGL 4 rely on a single thread for dispatches which means although a certain amount of work can be made parallel, you are still limited to about 4 threads being useful and single thread throughput is of critical importance to the main dispatch thread (hence Intel's insurmountable lead for driving games).

In Mantle, OGL Next and DX12, this limitation is removed. Mantle might not stick around however it will have done it's job, as the next gen API's will negate Intels inherent IPC advantage in gaming. Then it becomes more about the overall resources available on the chip, and a 'more cores' design from AMD actually has a chance (FX 8350 vs Core i5 / i7 in Mantel looks a whole lot nicer for AMD than the same games in DX11).

At the end of the day, I think if AMD can put out a cpu with a good boost in single thread + more threads at a sensible TDP that is competitive with Intel on a wider selection of workloads then they'll be fine.
 

jdwii

Splendid


Yes and reality wont stop even if people want it to
 

jdwii

Splendid


It's been quite a few years since quad cores made the market and we are actually seeing programs use 4+ cores now but that doesn't mean anything when Intel gives you 4 threads for 190$ and Amd gives you 4 threads for 90$ but offers 1/3 the performance. PC gamers alone will not help Amd last forever they need to convince others to use their products which means extremely low prices(little profit for Amd which means little $$$ for research) or a high end product that sells at a low price.

Also the 1100T was a fantastic CPU which cost 280$ i think when it came out 40$ less then a I7 920. My 1100T was bottlenecking my 770 so i had to upgrade but i'm not sure how a 920 would have done in later games. I can say the 8350fx wasn't as big of an upgrade as i wanted in normal applications but it gets the job done, tbh the biggest upgrade was in gaming something i didn't expect.
 


Well I think we need to keep in mind that AMD's "wider than Intel" approach isn't that bad in many scenarios where you really need the performance. Gaming is a big area they've lost ground in, if you think of other software that is predominantly single threaded, it's also the same software where the performance of CPU's has been sufficient to not be annoying for years (e.g. office tasks). That's the reason why many people are happy with tablets for example.

Piledriver FX does quite well in things like rendering and video encoding. The usual 'embarrassing' benchmarks for AMD have been games (compared to the 'equivalent' Intel part) and stuff like single threaded MP3 encoding (pretty sure this one *can* be multi threaded but currently isn't), single thread ZIP (again more modern programmes use multi threads) and tests specifically designed to run on 1 core only.

I mean from a day to day perspective what software do you run that is *actually constrained by one thread* to the point you'd notice it? I find that hdd performance is the bottleneck more than anything- the reason I'm still using that Phenom II at work is nothing really taxes it that much and I do 3D CAD work for a living (notoriously limited to single threads due to the nature of the software, a single Phenom II core at 2.6 ghz is fine though in my experience, I'm not waiting for the system ever really apart from saves which was sorted by the addition of a nice SSD).

I agree a flexible architecture needs a mix of both, I actually think one of the reasons FX was bad though was that the multi threaded performance wasn't anywhere near as high as it should have been. They would have been forgiven much more if their 8 threaded processor delivered what they said it should (i.e. a dual core module offering 80% the performance of 2 full cores- which would make the '8' core part faster than a similarly clocked 'full' hex core). The problem was on release the lower clocked, high IPC Phenom II X 6 was faster than the 8150 despite the large clock speed bump to the latter. The 8150 should have outrun it clock for clock for AMD's argument to hold water.

At the end of the day I still maintain the current FX 6 and 8 core parts aren't a bad buy in light of the price (you get a lot of processor for not much), however they were very underwhelming.
 

truegenius

Distinguished
BANNED

k10 core was good, its the uncore part of phenom which was responsible for its poor performance

my 1090t @3.6ghz (ram @1333cl9) yields 4% and 6% performance increase in 7zip and winrar respectively in stock nb vs 3ghz nb.
and if we combine this with ram increase too (1333mhz stock nb vs 1800mhz 2.925ghz nb) yields very significant 12% and 24% increase in 7zip and winrar respectively.
if i only increase ram ( that is 800mhz vs 1333 mhz) then winrar only show 8% and 14% increase in 7zip and winrar respectively

if we consider above increases as a result of only ram increase (considering linear performance increase) then with stock nb we get only 12% and 21% increase in 7zip and winrar respectively for 100% increase in ram (tested 800mhz and 1333mhz ram at stock nb)
but if we use increase nb too then we get 35% and 69% increase in 7zip and winrar respectively for every 100% increase in ram (tested 1333mhz stock nb vs 1800mhz 2.925nb)
(also note that performance gains diminish with ram speed increase but we achieved increasing performance increase when we increase ram and nb)

that means if amd makes 1866mhz (up from 1333) ram support and nb to 2.7 to 3ghz (from stock 2ghz) for 1090t (k10) then we can see 14% and 28% increase in 7zip and winrar respectively.
And this increase is achieved without increasing any clock, core or cache.
combine those increase too and you can leave even fx9590 into miles of dust.

i have done more tests
and all result shows significant gains
and these will improve gaming even more
for example in 3dmark vantage cpu physics, i got 1.6% increase @1333mhz vs 800mhz (stock nb) ( that means 2.4% increase per 100% ram increase), but much better 4% increase @1800mhz (2.925ghz nb) vs 1333mhz stock nb) (means 11.5% increase per 100% ram increase).

so overall we could have get performance increase from 5% to 30% by just increasing faster ram support and faster NB, and if we combine this with other core tweaks like more and better cache, more core then we could have get more performance increase.
(while bd/pd/sr only relies on its 32nm, more core, faster core clock, better IMC, faster ram support, more cache to match k10)

(all tests were done with all 6 core of 1090t @3.6ghz and 4GB ddr3 single channel ram, using internal benchmark, dictionary size for 7zip was 64MB)

what they can do with k10 is improve uncore, cache, faster ram support, die shrink, add more core, quad channel support (for bragging rights), and all these will result in much better cpu than their modular joke which will perform much better in single thread and even at multi thread softwares.
even intel experiments their old arch with new die shrink (eg, ivy vs sandy was more or less just a die shrink, while haswell was arch improvement). So why can't AMD, they just threw the k10 and released a joke.
 

8350rocks

Distinguished


Mantle is now an open standard. Did you miss that?

Also, this is not an Intel propaganda thread, so to clear a few things up:

1.) AVX2 is 256 bit not 512 bit. Also, so few programs even run AVX2 code, and very few would even benefit from an increase in speed. So, your theoretical maximum Intel performance in FLOPs will come out FAR SHORT of that projection. Any time you start talking about theoretical figures, you have to understand that even Intel cannot get there. In fact, it will likely be at minimum 5 years before they achieve half of that theoretical maximum performance, and even then only in specific scenarios where you can run AVX3, or whatever it ends up, code and benefit enough to bother. To this day, the advantages past SSE4.2 are significantly diminishing. Intel loves to put up big theoretical numbers, only to have them end up not even close.

2.) As for your predictions about future AMD hardware, I think I have already shown how little you actually know by pointing out your obvious fallacies in your assessments, as well as your comments. Go shout about marketing papers at S|A...or did they already run you off like anyone intelligent would have?
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


AMD's dGPU division is in bad shape (even admitted by AMD's vicepresident). The CPU division doesn't generate a profit for years. They can continue spending money on both at a lost or they can focus on those specific SoC markets where can survive because the competence is small to no-existent...

The 8-core Bulldozer CPU die size is 315mm^2. On 7nm it occupies about 15mm^2. Those 15mm^2 occupy the 4% of the die of a 7nm APU of the same die size than the PS4. You could duplicate the size of the iCPU and still only occupies the 8% of the die. It is evident that dGPUs don't have a bright future...
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


This is the same kind of flawed argument behind the Bulldozer fiasco.

A wide CPU goes against the HSA philosophy. The own HSA spec admits that CPUs are good for serial workloads whereas GPUs are good for parallel workloads. And you can find this on slide as well

28052302338l.png


Bulldozer produced total throughput at the cost of serial performance and, as a consequence, it was crushed by Xeons on serial workloads and by Nvidia GPGPUS on parallel workloads. It is not a coincidence that most supercomputers are Xeon+Tesla neither that Titan supercomputer (with Opterons) will be replaced by Power+Pascal.

A hypothetical 20-core FX CPU made of relatively 'weak' cores will suffer the same fate than Bulldozer. We will find that serial workloads will run better on quad-core Skylake CPUs whereas parallel workloads will run better on GPUs.
 

logainofhades

Titan
Moderator


Yet K10 is faster, clock for clock than faildozer and probably even piledriver. ive K10 a shrink, the improved memory controller, more cores, plus the increased frequency, and it would be a pretty decent chip. The faildozer approach was just not a good one, at this point in time.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Uh?

This is the promise AMD did months ago

http://wccftech.com/amd-public-mantle-sdk-coming-year-nvidia-intel-free/

but at January 2015 the SDK has not been made public and open still

I ask again: Which is the excuse?



What have to do this collection of nonsense and personal attacks with my post?
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Now try again with more realistic assumptions and you'll see CPU 5-10% better than FX-8150, less power efficient and probably with lower potential for next 3 generations (Piledriver, Steamroller, Excavator).

Major redesign of K10 (K11 or something like that) maybe would be able to compete with Intel, but it was too late when they found out that Bulldozer sucks.
 

8350rocks

Distinguished


Actually...this is inaccurate. Maybe first gen bulldozer in specific tasks, but piledriver is faster across the board clock for clock.
 

truegenius

Distinguished
BANNED


*i stopped reading after that :heink:
 

truegenius

Distinguished
BANNED


you sure for clock for clock :D ?
then we can have a match in cpu related task ;) (with neutral advantage of new instruction)
pd vs k10 (i.e 8350 vs 1090t) single core at same speed with same NB speed, same ram amount, channel, speed latency.
i prefer 4ghz cpu speed, 1600 cl8 ram, 3ghz NB, single channel (i sold my vengeance ram so only 1x4GB gskill left)
though we can't equalize cache, but still it will be close enough for arch vs arch match

we can create a thread for this match, and can create some rules so that no one cheats.
to make it more interesting, anyone can post their results ( athlon, phenom, fx81, fx 41, fx83, fx63, all amd ( or maybe intel too ) ) this will give us idea of performance difference with different amount of cache.

if you want something like this then i almost always have free time, your call ( or anyone else interested ;) lets do some experiment for amd )
 


Which is correct. Slapping multiple serial processors together and claiming you can now do parallel tasks does not make it so. Parallel workloads are forever the domain of the GPU. Multiple cores are good for running multiple serial tasks, but not so much for running a large parallel one. And parallel task you run on a CPU will almost always run orders of magnitude faster on a GPU (See: Video Encoding).
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No.

A fourth-stepping Piledriver FX chip barely matches an ancient Phenom II

Gaming
FX-4350 @ 4.2GHz: 141
Phenom II @ 4GHz: 137

All applications
FX-4350 @ 4.2GHz: 137
Phenom II @ 4GHz: 140

http://www.tomshardware.com/reviews/piledriver-k10-cpu-overclocking,3584-18.html
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


K11 is Bulldozer family.
 
Status
Not open for further replies.