AMD CPU speculation... and expert conjecture

Page 682 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


I like this argument. For all the things we can say about nVidia being the regular "all your money belongs to me", they do things because of important reasons (money mostly, haha).

Kepler and Maxwell not being "proficient" at "GPGPU" has to have a good reason behind it. For all I like how GCN turned out, it's not Fermi nor VLIW4, so in terms of being "jack of all trades, master of none", I have to say I like nVidia's approach more: longer life cycle for heavy computing uArch and shorter time to market for "gaming" cards. Well, point is, GK104 GM104 have a reason for existence just like GK110 and GM110. The "mid range" and "top range" is not the full explanation for this segmentation.

And yeah, efficiency. nVidia made a bet with themselves about taking over the low power segment with their own designs, but that hasn't turned out good for them. So far, they're losing it, I'd say. In any case, efficiency for nVidia was a consequence of many things, not a goal from the start. Intel is (has been?) pursuing that now (for a while?) and AMD will have to do so with K12 or they'll be erased from the map.

Cheers!
 


Except the CUDA base is already there. It's cheaper to assume NVIDIA will stay around, rather then re-write everything to support AMD. Besides, it's not like NVIDIA is going anywhere anytime soon...

And yes, these are the same people who got burned when DEC went under. My company only finished replacing it's old DEC equipment a few years ago, and we STILL have a VAX/VMS server and Windows 3.1 PC hanging around. Oh, and we have critical test equipment that runs on those old HP-85 calculators, of which only one still works, and all the backup tapes died years ago (we're using the last master tape that works.

Corporations do NOT like spending money replacing things that currently work.
 

colinp

Honorable
Jun 27, 2012
217
0
10,680
It's not just about nvidia going anywhere. It's also about the risk of uncompetitive pricing, a bad generation of hardware, poor technical support and so on. Opencl opens up the opportunity to use nvidia, amd or even intel.
 


I don't think he's talking about switching design's post implementation but that single supplier risk could alter the strategic decision during the design process itself. I've run into this myself, it's why we've decided to go away from SPARC Solaris and to a x86 RHEL platform for our future product releases. We will maintain our current platform for at least another two product generation cycles as we slowly migrate all our code to be vender agnostic and modular. The predicted fielding date will be sometime in 2017 with it being a gradual transition of services and back end components, the front end client architecture shouldn't even notice that something changed in the back. This process is expected to take two years, and the discussion about this started about four years ago.

People, that is how long corporations can take to make major IT changes. Nothing is ever done quickly and their strongest motivator is risk.

Gamer, did anyone in your company every do a cost risk analysis on that aging hardware? Determine what would the costs be to the company if that hardware suddenly died. What would the costs be to create a new solution from scratch in the case of an emergency (dying hardware qualifies). This is what we usually do and the numbers it produces are very frightening to management types, tens of millions of USD in non-budgeted, unexpected costs tends to keep them up at night. In comparison to that, spending a few million USD to modernize an aging platform while simultaneously removing additional risk factors and possibly increasing production, seems like a bargain.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I don't expect Zen beyond late 2016.
I don't expect CMT modules.
I don't expect 16nm.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780
I don't think I've ever seen proof that CMT is bad. All I see is proof that Bulldozer family has problems that go beyond CMT. If someone can show me another CMT design that was completely abysmal, I'd love to see it. To me it sounds like people who irrationally hate CMT are similar to people who blamed Netburst's problems entirely on Hyperthreading.



m8, we're talking about the future here and speculation. Not what people have bought over the last 5 years and what market share of the last 5 years looks like.

As GamerK is saying, CUDA is entrenched and it takes a lot to persuade people to leave that ecosystem. Part of dropping FirePro prices like that is to give people stuck on CUDA motivation to switch.

If you have to upgrade a ton of workstations, and you can save hundreds of thousands of dollars by switching to AMD GPUs, and the performance is actually better, it might actually force people to consider changing.

But changing is a big deal, switching from CUDA to OpenCL is not a move to be taken lightly, specially if you have something that works fine in CUDA.

I don't think at all that FirePro price has to do with poor performance because AMD is behind or struggling. It has to do with unseating CUDA dominance on the market in preparation for HSA.

Also, even if AMD cuts FirePro prices significantly, they're still doing far better than selling consumer cards.

http://www.newegg.com/Product/Product.aspx?Item=N82E16814195129
Even if AMD takes $900 off of the price, they are still making a killing compared to selling consumer cards with the same GPU.

Like GamerK said, it's not going to be easy to get people off of CUDA and onto OpenCL and HSA. AMD is going to have to get aggressive in this market with pricing as well as various benefits like saving money on professional software.

I'd imagine if AMD was running around charging the same price as Nvidia Tesla and Quadro while going "hey switch to OpenCL it will cost you money and you'll throw out all your CUDA software but it's totally worth it! See um it's more efficient at GPGPU so if you run everything at 100% load for the next 5 years you can save a tenth of the cost to switch from CUDA to OpenCL! It's such a simple business decision!!!", they'd be laughed out of every company switching to AMD was mentioned in.

AMD pushing HSA is going to be risky, they're going to have to be very aggressive with pricing to get market share so people will develop HSA software. Pro market is the same, they need that market share so they can approach Adobe and say they want to make some cool AMD specific features and not have Adobe go "lol you have 10% market share in professional market why would we do anything for you?"

I know you're quite the APU guy, but what do you think AMD's business strategy should be? Put 2m/4c APUs up against full GM110 GPUs? You realize that professional market will laugh that idea out of the room, right?

What are you expecting? AMD to release HSA and go "check this out, you lose performance on an APU in traditional GPGPU and traditional CPU workloads, but you can sort tables in Open Office really quick and if you use our JPEG decoding software, you can generate thumbnails that get cached and never generated again twice as fast, which really helps if you have a really high end camera with a ton of megapixels like everyone has!
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I don't know anyone who mix Bulldozer problems with CMT problems. CMT is largely unpopular in the industry because it doesn't add anything except trouble. Even the author who invented the concept, admitted that AMD engineers tried CMT because lacked any other better idea.



Market share gives an idea of where is each company, which is a fundamental information if you want start to speculate where will be tomorrow.



Nvidia 85% market share on GPGPU is just measuring the amount of CUDA inertia. As gamerk mentioned even if AMD cuts prices to 1/10, it will not change anything significantly.



I must be the guy that follows the laws of physics and economy.

My point is that both Nvidia and Intel have made the homework. Nvidia will release cards with an efficient architecture without PCIe bottleneck and with ARM cores inside to reduce CPU-dependencies. Intel will replace Phi cards by more efficient Phi CPUs which also eliminate PCie bottleneck and offloading.

AMD has not made anything and will be stuck to an inferior technology. They will try to compete releasing cheap FirePros, but this strategy will suffer from the same fate that their failed server strategy. Of course HSA doesnt change anything. I already explained before that HSA reinvents the wheel and does it square.

What would AMD do? Well AMD is too late to the GPGPU party. There is nothing that AMD can do now after of years taking the wrong decisions. They can try and waste the money that they don't have or just give up another market and focus in the pair of markets where they still have some chance to survive such as semicustom.
 
Umm ... CMT isn't AMD, it's an industry standard. Chip Multi-Threading, it's a way of assigning multiple thread stacks to a single processing element. Intel's Hyper-Threading is an implementation of CMT but with only two threads. Oracle's SPARC chips use an eight thread assignment implementation, same with IBM's Power8 design. AMD isn't even implementing it standard CMT and instead going with a pseudo-hardware implementation where each thread stack is tied to a specific set of hardware at a one:eek:ne level.

Now if we are talking about their modular design, that's a cost decision. Designing a CPU architecture is extremely expensive in man hours. Engineers need to pour over every nanometer of the design and manually tune the traces and transistor placement for optimal performance. This is why companies try to stick with a single overarching design and just tweak / stretch it out as long as possible. In the case of BD, AMD decided to go with an automated design process where a program will attempt to automatically map out the required components for them. It's like a form of compiling but for hardware design. The software is limited though and designing an entire CPU is outside of it's capability. Instead it created standardized components, those components where then assembled together to form a single chip. It made standard integer units, memory management units, the 128-bit SIMD FPU, the scheduler, and so forth, each done separately. They are then assembled into a single design using the standardized interconnects built into them and the engineers start optimizing it. This is the performance increase's we saw from BD -> PD -> SR, each version the engineers had more time / experience to optimize the length, placement and timing of the various traces and transistors. Doing it this way was much cheaper and produced a set of stencils that could be used to cheaply design a CPU / APU to fit any need without having to go back to the drawing board.

We can debate the merit of the automated modular approach all day, it was definitely cheaper then the by-hand method but also produces a design that is less optimized and tuned. Ultimately it produces a value orientated product due to it's cost.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No.

You confound AMD CMT, with Oracle CMT and with SMT/hyperthreading.

Oracle CMT means Chip Multithreading and it is defined as the mixture of CMP (chip multiprocessing) plus Interleaved multithreading)

http://www.oracle.com/technetwork/systems/opensparc/58-system-cmt-rick-hetherington-1500287.pdf

AMD CMT means Cluster-based MultiThreading. This is AMD implementation of Andy Glew's MCMT

http://en.wikipedia.org/wiki/Bulldozer_%28microarchitecture%29#CMT

Finally, Intel hypertreading is Intel implementation of SMT

http://en.wikipedia.org/wiki/Hyper-threading

Neither Intel hyperthreading is limited to two threads per core as you say. Xeon phi hyperthreading is four threads per core.

It is worth mentioning that Andy come with the MCMT idea about the year 1996 and tried to sell it to Intel. Intel rejected the idea (with good reasons). Latter he tried to sell the idea to AMD, which also initially rejected the idea. It was about 5 years after Andy abandoned AMD when engineers had no fresh ideas about future architecture and decided to recover Andy original design from the archive of crazy ideas, with the fabulous result that everyone knows...



Bulldozer problems start by being based on Cluster-based MultiThreading instead being based on CMP (e.g. Jaguar, A57, Denver) or CMP+SMT (Intel i7, Oracle SPARC M7, IBM Power8, Broadcom Vulcan...). Then continue with the crazy speed-demon approach, the bad memory and cache subsystems, and finish with the use of automated tools, which, yes, reduced the initial design cost, but at the expense of bigger die area, which increased the fabrication costs, and power consumption which did cost AMD the server, supercomputer, and laptop markets.
 
Gamer, did anyone in your company every do a cost risk analysis on that aging hardware? Determine what would the costs be to the company if that hardware suddenly died. What would the costs be to create a new solution from scratch in the case of an emergency (dying hardware qualifies). This is what we usually do and the numbers it produces are very frightening to management types, tens of millions of USD in non-budgeted, unexpected costs tends to keep them up at night. In comparison to that, spending a few million USD to modernize an aging platform while simultaneously removing additional risk factors and possibly increasing production, seems like a bargain.

Sure. It was decided it was cheaper trying to keep the old stuff up and running, since the last time we tried to do a port, it failed in rather spectacular fashion. For the specific cases:

I lead the initial work converting our old HP-85 programs to HT Basic on Windows. Was a pain, since HT Basic is based off the HP-87 dialect, and there were enough differences to make life a pain. That, and the EE's who wrote the program used a gratuitous amount of GOTO statements. And thought mixing STRING A, INTEGER A, REAL A, and A() throughout the code was a great idea. Oh, and no one has a clue how the code works (no documentation), so we're testing to results, rather then re-validating the code.

The Win 3.1 computer likely isn't ever getting upgraded. One of a kind hardware again. The root issue on the SW side is that there are VERY strict timing deadlines that simply couldn't be met when they tried a Win98 conversion. I can maybe do something using fibers instead of threads, but ultimately, we'll likely drop the platform when the PC finally goes.

We have a lifetime support contract on the VAX/VMS system, so we're good there. Good thing, since it breaks about once a month.

Buy hey, we're finally getting the paperwork done to begin a conversion from XP to Windows 7. So there's a start!
 

alcatrazsniper

Reputable
Dec 31, 2014
58
0
4,640
This whole thread shows how desperate AMD fans are for new gear. 17,000 replies? It's nice to see that there will finally be some competition for the i5 line of intel processors though. Maybe AMD can create a 16 core processor so that it can keep up with the 4 core.

I'm joking. It's nice that AMD will be releasing steamroller.
 


Sadly your argument misses the entire point of HSA. Intel are (as usual) taking the simplest quest route to achieve a goal. That *doesn't* make it the technologically best (or even correct) option long term. HSA was a lot of work I agree- and hopefully down the line we'll see the benefits.

The problem with your analogy is that Intel's approach is still to try and shoehorn in something that is sub optimal to complete the task. The other issue is that their approach still only accounts for using *one type* of processing element at a time. It's an in elegant approach as a Xeon Phi card can only do parallel tasks well, just like a more traditional CPU is limited to fundamentally serial tasks. AMD's approach doesn't just allow for GCN and CPU cores either- we're talking video decoders, audio units, security processors, different CPU architectures and so on all (eventually) sharing a common platform and memory system. It will take a while, however being able to assign jobs on the fly to different execution units without being tied down will ultimately be the better route just as so many other things AMD have done have later been validated by the rest of the industry.

You want examples? Native quad core (AMD got there first, unfortunately for AMD, Intel managed to get a 'double dual core' out of the door first and due to the lack of suitable software the inherent advantage of the native quad core die wasn't realised allowing Intel to maintain a lead despite a fundamentally inferior design), Hyper Transport (or more importantly NOT FSB), Integrated IMC, 3 cache levels- all of these things were dismissed when AMD got them first, only to be adopted by Intel the next generation.

I also know that once AMD has done all the ground work (and got discredited for their trouble) both Intel and nVidia will come out with a great announcement about the next *big* development and it will look a lot like HSA.
 


I think we're all just hoping (for everyone's sake) that AMD stay in the game. Intel and Nvidia will have no qualms about hiking prices significantly if they get the opportunity, think about this, most things increase in cost over time due to inflation yet the value of computer hardware has been falling for years. It's now possible to get high performance equipment for next to nothing, and that is down at least in part to the fierce competition between the main vendors. Now many are predicting here that AMD are too far behind, and maybe that's true, however *if* they go away expect big price increases from the remaining parties. They'll no doubt want to push the value of PC's back to where they used to be, to the detriment of the consumer.
 

Embra

Distinguished
AMD switching 28nm process to GlobalFoundries in 2015
http://hexus.net/tech/news/industry/78757-amd-switching-28nm-process-globalfoundries-2015/

AMD's 2015 graphics products will implement GCN 1.2 revisions, featuring similar design tweaks to the recently launched 'Tonga' R9 285 that sports improved colour compression, geometry and tessellation performance. The main process node for these products will now be Global Foundries' 28nm SHP, rather than TSMC 28nm, but the potential for products based on smaller process nodes, such as 20nm, is yet to be ruled out.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


No.

My argument address the entire point of HSA and then explains why HSA is reinventing the wheel but doing it square.

For instance, on single-ISA heterogeneity you complement the latency cores with throughput cores running the same ISA. In the HSA approach you take a latency core running the native ISA and add a throughput core running another native ISA. Now since both cores cannot cooperate natively AMD needs to introduce a common HSAIL ISA on top of each native ISA: the HSA latency core executes both native ISA and HSAIL ISA, the HSA throughput core executes both native ISA and HSAIL ISA.

Now the HSA approach deals with three or more different ISAs and translation layers (e.g. the HSA TCUs translate the HSAIL ISA to the native ISA before execution) to achieve the same heterogeneity results obtained by the simple approach based in a single-ISA.

Neither is true your claim that single-ISA approaches to heterogeneity uses only "one type" of processing element. It is named heterogeneous compute (Intel calls it neo-heterogeneity) because works with arbitrary mixtures of different processing elements at the same time. For your information, the early research in the topic considered four different kind of processors. Further research showed that most advantages (performance and efficiency) are obtained using only two kind of cores. Precisely this result has been replicated by AMD whose HSA spec. only considers two kind of compute elements: LCUs and TCUs.

You mention old advances made by AMD years ago. Those were genuine advances that improved things over the existent technology. This is not the case with HSA, which just tries to replicate what already exists, but HSA does in a more complex and convoluted way.

Neither Intel nor Nvidia will go the HSA route, because HSA is inferior to the technologies that both Intel and Nvidia have developed.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


I think you forget that HSA is going to lag behind CUDA and even Intel's efforts by a large margin in regards to release date. The initial release of CUDA was in June 2007, it's about 7 and a half years old.

the ones who are late to the party always have an uphill battle unless the other side botches things spectacularly. CUDA basically has at least an 8 year head start on HSA. g80 GPUs supported original CUDA. ATI HD 3000 GPUs supported OpenCL but VLIW was not easy to use to maximum potential and HD 3800 segment was kind of bad compared to ATI.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


As you know steamroller was released a while ago. :sarcastic:

But let us check near future.

Skylake will be 64 FLOP/core architecture. Whereas Zen will be likely a 16 FLOP/core architecture. Add that AMD will be stuck to lower frequencies (I estimate 3.0--3.5GHz for Zen) and the result is that AMD will need to release 18--21 Zen cores to match the floating point performance of 4 Skylake cores.

And unsurprisingly there are rumors about AMD releasing future 20-core FX CPUs for 2016... This is weird.

Of course, AMD will be more competitive on integer performance but still behind Intel. I believe Intel will be between 50% and 100% ahead on general workloads.
 

jdwii

Splendid
^^^^
Won't be in my machine. Switching sometime in the next 3-6 months who ever has the best CPU for 350$ will get my money. Looking at the i7 4790k I should expect major improvements(up to twice as fast like emulators) in single threaded tasks(dolphin emulator) and far cry 4 which uses one 8350fx core at 80%. For programs like hand brake I should see a 30-40% boost. All while using 50 watts less. Plus I want a new chip set.

After Intel's improvements in ht with haswell I now see ht is far more efficient it adds virtually no die area and maxes efficient y for its current resources.

If amd was smart they should change over to a design with 4 ALU and 4 AGU and a 512bit fp unit per core with SMT. That's not all either just one main reason why they are behind and are only througing more cores still thinking its going to change anything.

 
If amd was smart they should change over to a design with 4 ALU and 4 AGU and a 512bit fp unit per core with SMT. That's not all either just one main reason why they are behind and are only througing more cores still thinking its going to change anything.

That wouldn't work. You can't just put 20 ALU's into a "core" and have it more powerful. From the ISA point of view there is exactly one integer unit present, anything more then that requires magic sauce in the form of a superscalar uarch. Front end scheduler abstracts those units in a way that allows multiple instructions to be executed ahead of time. So now only must your scheduler be capable of keeping tabs on the pre-execution but your predictor also needs to be capable of accurately predicting whats needs executing before it needs executing. Not to mention your cache needs to keep all of this in L1. The Phenom II barely used it's 3 ALU's, AMD doesn't have the secret sauce (yet) that enables them to use for ALU's, not even Intel has that. Intel puts 4 ALU's because HT enables them to get more use out of them.

As for the FPU, you can't just make a 4096 bit FPU and claim it's faster. The bit width of a SIMD processor depends on what instructions it needs to execute. The vast majority of SIMD instructions used right now are 128-bit, 256-bit is new and 512-bit is basically on paper only. Going wider doesn't give you any more general performance, a 256 bit instruction is not twice as fast / powerful as a 128-bit instruction. The situations that you need such large instructions are pretty rare, they mostly exist in the media processing world where your doing extremely repetitive static mathematical operations on large data sets (adding the value of 16 to every pixel on the screen). Making wider SIMD processors makes them more complex and thus more expensive, so it's always a trade off.

Much of the single thread dependency between the PD/SR design and Intels SB/HW design revolves around AMD's less advanced predictor and the less optimized automated design process combined with the operating environments not really optimized (timing / scheduling) for the CPU. When software is optimized for the CPU you do see large increases (Linux basically), but even then a manufacturer can't design a general purpose CPU with such mentality. Much of the doom and gloom is really a case of failed expectations which was a given seeing how much people think. Never get your hopes up about any development, there is always a catch and always conditions.
 

truegenius

Distinguished
BANNED
trash modular whatever zen :fou:
i will still like to see die shrink pure k10 cpu :hebe: ( don't bring llano here :kaola: )
and i will say again
they can optimize its memory controller, can optimize and put more cache, can clock it up, can add more cores, can add new instructions, and it will still remain under 125w tdp
even intel had problems to get more core and clock at 45nm while keeping tdp low that is why even quad core i7 used to have 130w tdp while 6 core k10 were running under 125w

does it take a fortune to improve IMC, cache, add inst, die shrink, add core :pfff:
they can try these things as an experiment by creating ES and then give them away for review and to see consumer's reaction :fou:

can some one here do some maths for me, how much power a die shrink (@28nm gf shp) thuban @3.6ghz will take (it take 1.25v and 1.3v for 3.6ghz 4 core and 6 core respectively which is less than 1.4v stock), and what will be power requirement of 4, 8 and 10 core version of this ?
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Duplicating the number of execution units (e.g. ALUs) requires logic to increase at least quadratically. I.e. the new design would be roughly 4x more complex.

Currently Steamroller has a 256bit FP/SIMD unit per module. A future design with 512bit per core, as you suggest, would increase complexity by about 4x as well.

You need lots of engineers and money and time to do 4x more complex designs, test them, and polish them.

Around 2007, AMD did spend on R&D about 1/3 of what Intel did spend on R&D. Now AMD spends about 1/10 and has fired lots of engineers. Thus the gap with Intel will increase more and more.

Moreover, in the past AMD was fighting only Intel, essentially. Now AMD has to fight Intel, Qualcomm, Apple, Nvidia, Broadcomm, Cavium, APM... at once.

All the docs that I have from AMD show future architectures with 256bit FMAC units per core (i.e. 16 FLOP per core). And leaks I receive about Zen/K12 seem to confirm that SIMD wide.

As I showed above if K12/Zen is a 16 FLOP/core design, then AMD would need 20-core FX CPUs to compete against Skylake quad cores and rumors seem to confirm my math-fu

http://www.bitsandchips.it/hardware/9-hardware/4675-fino-a-20-core-x86-per-le-prossime-cpu-fx-ed-opteron-2016-17

Of course a 20-core FX CPU is a ridiculous concept.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Terrible idea. K10 is a dead end in long term. Just wait for Zen.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780
A shrunk K10 would just be more of what AMD is struggling with, really good multi-thread performance and poor single-thread. They need to get away from that. It's going to be a long time before things scale to a lot of threads, but it's going to have to happen eventually. We still see single thread performance being an issue yet it's significantly slowed down in improvements from both Intel and AMD.

It costs way too much to significantly improve single thread performance for x86 now. Even though no one wants to admit it, Intels' high end is all about more cores now too. HS-E 8 core is just more cores, and they're even clocked lower than a lot of lesser core CPUs so it loses in benchmarks that are single thread dependent.

AMD isn't the only one struggling with single threaded performance, and it's clear developers want more. But Intel gets a free pass by a lot of people because they are the leaders. But if you are the leader and you're not really going anywhere, is it really a good thing? I don't think so at all.
 
Status
Not open for further replies.