AMD CPU speculation... and expert conjecture

Page 598 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


Yes it has. And as ARM reaches the point where people don't need to upgrade their tablets and phones anymore, yet it can't break into x86 markets that depend on x86 software that isn't getting ported to ARM (see WinRT), a few people on this forum are going to squirm pretty hard and probably disappear. Although some of them have a good habit of just ignoring things they said in the past.

Nvidia's Q2 '15 results are out and one of the largest factors they contribute to their success is gaming PCs. As consoles have moved to become more PC-like while building a PC has become easier and cheaper than ever, HEDT is going to continue to grow. And this growth will be in regions where ARM can not access because it doesn't not have the software to back it up.

No one in their right mind would switch to an ARM gaming PC and throw away all those old games and the massive catalog of x86 games. AMD's issue with not pursuing HEDT is not because of demand or the HEDT market, it's because of AMD's product portfolio. If AMD really believes that "ARM will win in the long term" across all products, it's not because they see HEDT and workstations dying off. It's because they see themselves becoming completely uncompetitive in HEDT and leaving the market to become an ARM dealer. Which means AMD giving Intel and Nvidia control of the x86 HEDT market. So, I guess you guys who love ARM to death and think it's going to win in every single market can keep celebrating the future of HEDT being Intel and Nvidia only, like you seem so fond of doing.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780
They broke something in Bulldozer...
4_year_APU_memory.png
 

jdwii

Splendid
Again we can really only wait Arm might end up being faster i actually do not think so i think Arm doesn't have special instruction sets like SSE-AVX-AES to name a few, not a genius when it comes to this so i might need someone like gamer or palladin9479 to help

After some reading it is CISC vs RISC battle again. X86 sounds more like both but its still based on CISC or complex instruction set computer
http://stackoverflow.com/questions/13071221/is-x86-risc-or-cisc


 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


NEON, THUMB and many more extensions.
 

8350rocks

Distinguished


The big kicker is the large FMAC pipeline instructions. x86 will be superior moving forward because of the capability baked in to do large instructions that make otherwise smaller more streamlined instructions tedious. If you can run 1 AVX2 instruction in a few clock cycles (2-3 let us say) on a 4 GHz x86, but it takes you a total of 10 cycles on ARM at 2 GHz...of course your IPC will suffer.
 


The machine language that is being fed into the CPU is x86, and the results that are expected to be output conform with the x86 ISA. Internally, behind the instruction decoder, it's all a specialized binary language that resembles a Load / Store architecture (RISC). Superscalar architectures completely dissolved the performance differences between RISC and CISC, it's become a matter of which side you want your toast buttered on. The reason architectures like MIPS / ARM are so efficient is that they lack much of the silicon that more powerful CPU's use for instruction predicting, scheduling and decoding. Look at a die shot of an Intel CPU, measure the space that's devoted to the Integer units (which is 90% of the work), now look at the space devoted to the front end decoders / predictors and then L2 and finally L3 cache. The integer units are dwarfed by all those other components even though the integer units are what actually do the work. The rest of that stuff exists solely to enhance the performance of those integer units. Whenever you start cranking up the performance requirements you end up stalling out the integer units. So the fight becomes less about how fast the integer units can execute and more about how fast you can get the correct data to those integer units. That is the difference between AMD and Intel, both of their integer units do basically the exact same thing but Intel is much better at getting the correct data to those integer units then AMD is. The above "memory" benchmark isn't even benchmarking memory but rather the cache unit. In fact any "memory" benchmark is only evaluating the cache unit unless it's block size is larger then the L3 cache on the CPU, preferably 16MB or bigger. Since most memory benchmarks use block sizes between 32kb and 1MB, and are only executing {MOV A, B}, it becomes a test of how fast the cache engine can predict the next block copy and prestage the entire memory unit in cache and buffer the writes.
 

jdwii

Splendid
Thanks guys for the help Cisc vs Risc makes more since now. Its like 1990's all over again ha ha. Started thinking a little more and i think Arm+GCN will be nice with GDDR5 or higher memory bandwidth for some higher end servers that are using extremely parallel code save TDP on the CPU and use it for the GPU instead and since Arm cores are smaller(for now). Still from what it sounds like some things will probably always be extremely slower for Arm (RISC)example look at a benchmark using the AVX instruction and look at a CPU without it its around 10 times as slow.
 


Whoa?! Steady on there- this is a 'speculation' thread. There are no points for being right or wrong- we're just a group of people who are interested in the development of computers. I've made a few predictions (which hopefully I've made clear are just that) that are most likely not going to be exactly how things play out. So what if I'm wrong? What's interesting is the debate.

Personally I think AMD will have some success with ARM, and I think they've got a shot of getting *some* traction in servers over the long term (a few % of the market would probably be enough to really start turning things around at AMD). I don't really see ARM encroaching on the desktop any time soon- although there does come a point where emulating older software becomes trivial due to the speed of modern machines (plenty of dos games and early 90s console stuff available running on ARM under Android for example, and things like Half Life 2 are now running natively on Nvidia's Shield handheld / tablets- ARM stuff isn't as far behind PCs as you'd think).
 

8350rocks

Distinguished


Those cases stem more directly from the difference in technology. What took a high end PC to run in the mid 1990s (Take Quake 3 for example), ran on a 300-600 MHz single core x86 CPU with roughly half the instruction sets that are now included in the ISA.

Things like that are simple to emulate and run on nearly anything. Your average ARM dual core ~1.5 GHz is probably 5 times the compute power or more that the old desktop was. That solution is really just brute forcing the software on vastly stronger hardware than was required to run it...
 


A lot has to do with the cache subsystem; Intel has a HUGE lead in that area, and as a result, Intel can keep their Integer units fed, increasing performance overall. Other things play in as well, but that's a HUGE one.

Same problem ARM is going to run into: As they scale up, the issue becomes less how powerful the Integer units are, but instead how you keep them fed.
 


One main thing is AMD hasn't been trying. They were going with a more core approach with bulldozer and thats where it has lead them today. They have less execution pipes per core and less frontend. Another thing is probably cache and memory performance. It all adds up. AMD probably was expecting single core performance to become obsolete much faster than the market actually moved. They were making products that just wasn't built for single core performance, their plan was to go from 8 to 10 to 12 cores but because of the backlash of bulldozer and restructuring, they canned pretty much all of that.



 

james pinnix

Reputable
Jul 25, 2014
23
0
4,510
We for me I'm glad amd is going for a less core count

I mean 8 cores at 220w

What would 10 look like 300w

Amd would do better at 4 and 6 the work on 8 once they master those 2

I am really curious about the amd athlon x4 860k and the amd carrizo

As far as gpus the amd 285 / tonga really should be a r9 275x
And the tonga X version should stay a r9 285x if they make it a 4 gig version

Spec wise I don't see a r9 285 doing much better then a 280

Titanfall required 3 gigs for textures the 2 gig limit on the gpu is going to keep it closer to a 270x and just a hair under a 280 over clocking with give it a nice boost but the 2gb limit I'd going to hurt it over all
 


Thanks for the input!

I'm curious about AMD's current pursuits... it seems like they know about their poor single thread performance and the negative affect it has on their performance, and in turn, profit. But what's curious is what they are doing to combat that.

Instead of targeting the problem at it's source, and making powerful and compelling enthusiast parts, with high single thread performance(which technology can then be used on low-end parts) they instead keep their same technology, and target niches with APUs, low end high-core CPUs, and of course, ultra low-end Kabini CPUs.

Does AMD intend to just ride off the niches and continue riding on inferior CPU technology, or will Carrizo and Excavator bring the improvements that AMD needs to truly be competitive?
 

james pinnix

Reputable
Jul 25, 2014
23
0
4,510
That's what I want to know as well!

Amd has a very good gpu since they bought out ati

I really hate that they don't have a cpu that can handle a 295x or 290x without major bottlenecks

They really need to focus on making a cpu that can handle any gpu they make

Not fighting with intel that battle is quite pointless and while they have their fingers in various other areas if they could just take some time and make 2 cpus one for low to mid range gpus and one for mid to high gpus then they would see a strong gain

Intel is better for everything else
Yada yada yada and I get that but
One thing amd has alway been pretty good at is making a soild cpu for gamers that just want to enjoy there games

For me amd has been the working man's cpu not super high end but not total poo

Just a soild-get-done-no-issues cpu

I really dislike that the apus which have great potential to be awesome are getting gimped by a lack of motivation pure and simple

Hell if they just made 1 cpu and had it run as good as a mid intel
That would be the cpu in every budget build

Now I know there's a bunch of super wizards that can spit out specs like no one's business about arm and servers but for me
Personally my pc has always been about gaming

I really want amd to bring that back more then anything
 
There is just nothing AMD can do at this point for the high end CPU market that would be good publicity wise and profitability wise. They can do what everyone post on forums about but economically it would be disastrous for them when they are just getting back to being a profitable company. They can only go where consumer are going and to market themselves that way. There is no point for them to try to battle a company 100x their size and they know that.
 


I think people really don't understand the sheer difference between Intel and AMD's R&D budgets. Intel can afford to spend 5~10x more then AMD on experimentation and perfecting their design, this results in AMD never being able to "catch up". Instead AMD must focus on value added and being able to deliver in area's outside of raw performance. That is why they went with the "APU" concept along with the custom chips / modular design. The reduce their costs as much as possible while trying to provide a "good enough" product for a lower cost then Intel. Basically they are trying to become the Walmart of commodity processors.
 

Ranth

Honorable
May 3, 2012
144
0
10,680


To be honest then your are using that 290x wrong..... When you say major bottlenecks I assume you mean that games are unplayable, which is only the case when talking very cpu intensive games like a bit like skyrim, yet you're still above 50-60 fps depending on which site you're taking it from, which is playable. But you also have games like bf3 where you're much more likely to hit a graphics bottleneck meaning that the CPU is less of an issue as it should.

And considering that games are moving ever so slightly to better utilize the increasing core count, means that fx-63xx and 83xx should have an aging advantage over the pentium anniversary and the i3's.
 

james pinnix

Reputable
Jul 25, 2014
23
0
4,510
I know amd will never catch up with intel but I do know amd can make a soild mid range cpu

That's my issue with them they are wasting money
On this hsa and mantle

Before I get chewed out let me clarify

1: HSA is awesome on paper. That's it
Not many companies are going to write special code for it AMD would have to somehow make there apus run HSA with internal coding vs 3rd party

2: Mantle has a good shot at being used but that's all in how dx 12 will work plus it will hurt AMD as well if mantle fails just because they are making hardware choices around it

3: If AMDs HSA and Mantle setups don't happen or get phased out or just flop that leaves them even further back from intel

The point is AMD has some amazing stuff both with HSA and Mantle but until it's being used more they need to have 1 or 2 things that will work now
While they push these new systems in the market
The a10-7850k will never be able to fully use HSA if amd ever gets it going not the way it's needed

Much like the fx 8 core series will never fully be used before it's outdated

It's just wasted resources that amd doesn't have to waste
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
AMD current situation is a consequence of (i) Intel being about 10x bigger and spending more money and engineers to work in a given problem, (ii) Intel having a process node advantage, and (iii) the former head/engineers of AMD taking lots of silly decisions that almost ruined the company.

The current head/engineers are trying to get afloat the company, but it is in the red numbers again.

Original plans for Excavator FX CPU were canceled, because the product was not competitive enough and the market for AMD is small; if last information is right the version of Carrizo for desktops has been canceled recently and Carrizo will be launched only for mobile. No expect miracles: Excavator will bring another small improvement in performance. Moreover Kaveri desktop is extended towards 2015, 2016, and (maybe) part of 2017.

About the future? Well I had big hopes with AMD next architectures and I shared them in this thread, but last info/leaks are worrying me. It seems that new micro-architecture designed by Keller is another poor-man arch. Due to lack of resources AMD is trying another shared (clustered) architecture, with weird stuff such as SMT1.5. The core will be small, a kind of jaguar core on steroids instead of a big (I mean really BIG) core.

As a consequence AMD will not compete against Intel for single thread performance. At contrary it will rely on moar small cores. A friend is saying me that AMD is planning new FX CPUs with 8/12/16 cores. Apparently they believe that MANTLE will help them with single thread deficit. The big problem is that HSA requires a mixture of few powerful cores and lots of small simple cores and AMD doesn't have serious plans for the first class of cores.

So much as this hurt, if all info is right, AMD will be crushed by both sides. At the x86 side will be crushed by Skylake and Cannonlake. At the ARM side will be crushed by all the ARM companies that have designed 90W ARM SoCs with a performance superior to 140W Haswell Xeons.
 
italian site sheets and giggles have a new rumor: 20 core x86 opteron.
http://www.bitsandchips.it/hardware/9-hardware/4675-fino-a-20-core-x86-per-le-prossime-cpu-fx-ed-opteron-2016-17
theoretically, they can make a 16 core right now, with jaguar/puma - two 8 core dies on mcm package. the site(actually wccfudgedupanalysisoftech.com's xterpe analysis) alleges no compute cores! and sam/glofo/tsmc process hinting at bulk substrate. heck, amd can stick 2 quad module u.l.v. exc cores on an mcm package and make a low power 16 core opteron. the rumor is alleging monolithic (max.) 20 cores. +NaCl
 


The problem is basically related to *how long it takes* to design and the launch a processor core. It's probably 5 years from planning, through design, to testing, tweaking and final product launch. Once you have your design, then you implement improvements for the next few years (stage AMD is currently at). Point was they didn't realize Bulldozer was the wrong way to go until it was too late- and now they are having to do the best they can with what they've got. I have noticed though they've managed to get the performance greatly improved a lower wattage, the 'dozer' designs appear to fall over when going for outright performance but based on Kaveri the improved SR core isn't actually *that bad* when running at more modest frequencies (probably a result of process).

I think there is more hope for the next get core from AMD, after all they do have Keller working on it. I know Juranga is worried however many cores is making more sense now than it did when Bulldozer came out and Keller is no fool. The idea with bulldozer was to provide 80% of the ipc of Intel, but with better multi-core scaling so that in theory it *should* have been faster than Intel in well threaded loads. Problem was the design was late, the process wasn't up to it and the end result was it was 90% of Intel speed in multithreaded loads (it's strength) and fell over in single thread. I mean the other thing is Bulldozer would have looked *much much better* if it was released about 18 months sooner (when it was supposed to be launched). Then you would have had PD vs Sandy (not Ivy), SR against Ivy, and EX vs Haswell (however the latter wouldn't have worked due to the process available to them).

Will be interesting to see what happens anyway :p
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


It is about both cost and time. According to AMD a top server x86 chip on a new microarchitecture takes them three or four-year time frame and $300--400 million in development costs.

If the new arch will be ready by 2016, subtract four and you get 2012, that is just when Keller joined AMD.



The problem with Bulldozer wasn't "too late", neither SOI hype, neither few multithreaded software. All that helped to the fiasco of course, but the problem with Bulldozer was that the whole design was nonsensical and internally inconsistent. The Faildozer design was targeting one thing and the contrary, the microarchitecture was optimized for one thing and for the contrary, with a final result of excelling at none.

The HSA specification is pretty clever: CPU for latency and GPU for throughput. This was already in the former FSA spec (aka Fusion). But genius working then at AMD considered a good idea to replaced a big core with two smaller cores to produce a CPU with more throughput; this reduced IPC, which then was compensated by higher frequency target; however, this increased power consumption by roughly a cubic dependence. More throughput requires more bandwidth; however, AMD provided a weak memory controller and weak cache subsystem that couldn't feed all the cores adequately. Moreover, AMD did pretend to compete by selling cheaper, thus big dies were not an option and AMD simplified the module front end for reducing transistor count, but this introduced a throughput penalty of about 20% and, at same time, did build the 12/16 core Opteron using dual-dies on MCM package. This packaging reduced performance and increased power consumption further compared to a monolithic 16-core die. Moreover for saving costs AMD relied on automated design tools (even for critical paths of the module) what increased transistor count by about 30%, increasing die size, power consumption, and cost by a similar amount.

Add to this the shared weak FPU within a module, because the original idea of Fusion was to use GPGPUs as external Floating Point unit, but this requires a good interconnect and AMD only provided PCIe2 support (this did make they lost Cray as customer due to lack of PCIe3 support for Opterons). Moreover AMD designed GPUs that were far from optimal for compute (VLIW is good a graphics but bad at compute), add sh*t drivers and the result is Nvidia owning a 85% of HPC/GPGPU market. Only recently new GCN arch from AMD is providing competition in this area.

In short, the whole Bulldozer design was in the middle of nowhere, slower at single thread than a CPU but with poor throughput than a true CMP design, ridiculous FP performance, unable to feed GPGPUs to exploit Fusion, power hungry, big, and costly,... AMD had to sell them for 'free' and getting into red numbers, they lost the server market and the HPC to ridiculous market share of 3% and so, even when typical apps are very well threaded (Intel sells 36 threads Xeons).

With Steamroller AMD corrected the shared decoder and eliminated the 20% penalty. With Excavator would introduce a double size FPU to correct the weak FP performance and tweak other serious defects of the CMT architecture, but that is not happening.

I mentioned above that my last info was that AMD FX series would make a comeback with 8/12/16 cores for 2017 and why this smell as another fiasco, but recent rumors are even poor. Now the rumor is 20-cores FX CPU

http://wccftech.com/amd-16nm-opteron-fx-processors-possibly-upto-20-cores-2016-2017/#ixzz3A47tgNYc
 
Add to this the shared weak FPU in a module, because the original idea of Fusion was to use GPGPUs as external Floating Point unit, but this requires a good interconnect and AMD only provided PCIe2 support (this did make they lost Cray as customer due to lack of PCIe3 support for Opterons)

Which is stupid for the very simply reason of the CPU not knowing the workload it will be seeing. Are we talking a single FP instruction, or complex matrix math? One you want to let the FPU handle, the other you want to offload. To send every FP calculation to the GPU will KILL performance simply due to the latency overhead. So your right back to needing the software to support it, which it already does via OpenCL/CUDA and the like.
 
Status
Not open for further replies.