AMD CPU speculation... and expert conjecture

Page 587 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

oh, you mean juan? he's parroting some s/a articles from before. and i've verified several times in this thread how he lies. his claims(which are actually blanket statements devoid of any specifics) are of no concern to me. this is why the reply was directed towards you.
as for dgfx - high end discreet gfx has always been niche. this is not something new. with the shrinking pc market, discreet parts are the things people stop buying if they think they have "good enough" performance from integrated parts. but they end up buying high performance discreet gfx when they need more than "good enough". it's not a technological concern, more of budget concern. in case of apus, the bottlenecks hit early due to it's integrated nature. right now for example, the igpu is faster than the memory it uses. so now the memory itself and it's usage will have to catch up. when the memory catches up, then the igpu will try to increase better utilization of faster memory. and so on.
an apu (e.g. 7850k) is efficient as any other device, for it's purpose and for it's price. it depends on the particular usage case.
in case of replacing multigpu accelerators... i don't know much, but i've always thought of maintaining memory coherency across processing nodes a major concern. with cpus you get 4-16 cores per socket, may be 64 max. per cluster with 4S. but with apu with 4+12cu or in the imaginary high perf case, e.g. an soc with 32-64+1024-2048CU, it may become a major problem. but i don't know much about hpc, i am speaking from basic p.o.v. additionally, in hpc you have the chip running on load full time, put that in perspective with my previous post.
actually, in load situations, you are using all of the cores in an igpu. gating only helps in lesser loads and when the power management logic in smart enough to determine the scenarios. the new consoles' socs are asics, not general purpose parts. and they have the huge benefit of close-to-metal programming - massively reducing software overheads. if you read trinity, richland and kaveri reviews where the stock cooler was used, you'll notice the mention of throttling on load. current apus have built in power virus detecting so that they throttle to protect the chip from thermal trip. so, it's happening already with current chips.

 

Fidgetmaster

Reputable
May 28, 2014
548
0
5,010
You are being a bit too excessive saying Haswell mops the floor vs 920....it is/does but in reality its not justification enough of a value/performance increase to go up to it for most purposes/application's in my opinion...

I think you underestimate like most how First Gen can still really perform well/do considering its age/dated...

 


True, the first gen i7 was a really good chip. I must admit I still use an AMD Phenom II X6, another seriously underrated processor imo.
 
If we look back at processor design the amount of external components now on die: The FPU (maths co-processor for anyone as old as me), level 2 cache, level 3 cache (super socket 7 + K6-III) PCI controllers, USB controllers, PCIe controllers, the entire north and south bridges.... Why is graphics and main memory so sacred that they cannot ever be integrated? Eventually the number of transistors available on die reaches such a point that including these things becomes inconsequential.

And why were all those components eventually moved onto the CPU? Because of die shrinks, you got both the necessary transistor budget to fit it, and because of the inherent power savings. But as has been noted many times: We're nearing the limit of how much more you can physically shrink the CPU down. No more shrinkage, no more growth in transistor budget, no more space to put more things on the CPU die [without sacrificing yields and driving up costs, at least].

Already, cache memory is VERY space/cost inefficient compared to standalone DRAM, in that it draws more power and requires a larger transistor budget then a stand alone DRAM module does. And that's just for maybe a hundred MB. You simply do not have the space to slap down GB of the stuff.

What's going to happen, long term, is what you're seeing now, where most OEM level CPUs will have a capable GPU built in, that is capable of playing back high-def videos and playing whatever the newest gaming fad of the month is [Sims, Warcraft, etc]. But you're never going to see a built in GPU beat a dedicated GPU, simply due to the differences in transistor/power budgets available.
 


Well, that all depends on if we really are at the limit of die shrinks or not. We've got at least another 2 or 3 nodes yet, and people have been (and keep) predicting that no further shrinks are possible and "it's against the laws of physics" and such until the next node appears. I think if anything major process nodes are lasting longer these days, so the pace at which this changes (for everything) is going to be slower. We maybe even need an entirely different substrate other than silicon for real improvements (graphine has been shown to be workable for ICs for example).

I agree there isn't much more AMD can do at 28nm, my argument is based on allot of if's and does assume continued scaling one way or another. I do think though that the iGPUs are getting to the stage to be "good enough" for pretty much everything- albeit at lower resolutions and detail settings, which can only be a good thing. OEMs cutting corners on PCs for so long has been one factor that has really hurt PC gaming imo- the number of expensive machines sold to the public with 0 graphics capability was absurd. Thankfully by including a reasonable iGPU on every processor Intel and AMD are basically putting a stop to that.
 


I find it interesting NVIDA chips weren't compared, given their emphasis on gpGPU. Just saying, there's no real HW difference between a GPU/APU that would greatly affect performance. At the end of the day, they're executing the same functions on the same HW.
 

jdwii

Splendid


Yeah i sold my 1100T for 100$ got the money in 3 days and i sold it on craigslist and got a 8350FX which to be honest is faster in general apps(handbrake is maybe 10%) but not by a lot it is however a lot faster in newer games by a decent amount at least 20%. Its worth noting that i was able to get my 1100T at 3.9Ghz on lower than stock voltage where this darn 8350fx can't get above 4.3Ghz without excessive voltage and heat, i guess its ok i usally get lucky with Amd my first CPU i bought from them was a Athlon II x4 and i got it OC to 3.2Ghz from the standard 2.6GHz on stock voltage and that was with the stock heat-sink even. Anyways you are right your 920 was probably the legend of all CPU's for the last 6 years its still at the top of the charts for gaming you can not say that for any Amd CPU back then for gaming maybe for general apps but not gaming.(and to think they only asked 320$ for it 1100T was only 35$ cheaper i think)
 

con635

Honorable
Oct 3, 2013
644
0
11,010

What did you think of the q&a with the adobe dev?
Im sure I seen other benches in this thread where an apu destroyed a 780+cpu in a gpu accelerated task purely because there was no transfer over pcie, I think discrete only works if the task is big enough to warrant copying across the pci, the sandwich analogy is a good one.

 


The issue is to utilize a dGPU properly for this type of task, you need to send large chunks of data over the bus a handful of times, rather then a little data over the bus all the time. The latency kills you if you take the second approach. Simply send the data you need processed ONCE, and the dGPU would pull ahead again. The APU wins these cases due to sub-optimal coding. Of course, this would result in reducing the APU performance due to the system RAM bottleneck. So now we have a situation where we can't optimize for both pieces of HW at the same time; one suffers, the other gains.
 

Fidgetmaster

Reputable
May 28, 2014
548
0
5,010
Yeah the 8350 is no slouch at all either...yes heat is big issue....My 920 I'm limited I can get to about 3.6-3.8ghz highest I can do with stock cooler, they are pretty wimpy....I was wanting to get the CM hyper 212 EVO for but its too tall/wide to fit in my Mid case....kinda sucks because I like this case and want to use it haha, and the 212 probably really is like one of the best deals/performance for that price....

Guess I might as well get a water unit, probally better anyways... one of the Corsair units look awful tempting to get...And would be really nice getting up to/seeing 4ghz+ hell even at 3.6-3.8ghz I'm getting into 70-75c load temps....so yeah better cooler would do wonders...

Unless there is some other decent air unit that will fit and be waaay better than stock heatsink....
 

8350rocks

Distinguished
I am going to clear some things up...

AMD will produce another dCPU for HEDT, it is already being engineered, and will be part of the next uarch. So clearly they do not think everything is going to APUs.

Second, when the article about GPU app acceleration was written, openCL had not seen much interest outside a few niches, or as a marketing ploy. Since then, HSA has stirred massive interest and can be done over PCIE bus.

Third, APUs will always be a compromised compute solution in the consumer sector. They will either be serial compute heavy, parallel compute heavy, or a jack of all trades master of none solution. In some scenarios, you could build specialized APUs, a la the consoles. However, these would be niche solutions to niche markets with a specific function in mind. It honestly takes two 290X cards or one 295X2 card to run a 4k screen reliably with good fps. That is 2 pieces of silicon at ~7 bil transistors each. Now, even accouting for die shrinks and new substrates, at 40% improvement generation over generation (asking a lot by the way) in 2 generations you still need an iGPU with 6 bil transistors. Let us make more assumptions and assume by 2020 density doubles over kaveri. That will be about 2 bil transistors (not likely but hey...), so by then we will have had ~4 generations of GPUs. Which would put us needing a transistor budget of 2.2 bil transistors on an iGPU to do 4k resolution only as well as a 6 year old discrete card.

Get the point yet? Now, ask yourself, if ~2.2 bil transistors can power 4k in 2020, what can a 6 bil transistor dGPU do with the same tech?

APUs as anything more than a mainstream solution are farther out than 2020.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


The APU only won 1 out of 5 of those benchmarks. The 8150 w/dGPU won the rest.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


I can't believe I've been agreeing with you this much. People who are making an issue out of PCIe or other bus latency for GPGPU are seriously deluded about what types of tasks you're doing on the GPU.

I don't know how many times I have to explain it. If you're doing GPGPU workloads, you're doing something that will take minutes or hours or even days and weeks. It's not something where even a second of latency is relevant.

And you are absolutely right. You can program it such that you reduce latency. Think of Open Office spreadsheet acceleration. If you do GPGPU operations on a bunch of cells and you have latency between GPU and CPU and memory, you aren't going to update each cell at a time and send them between CPU and GPU and memory. You're going to ship the whole thing to GPU, use VRAM as a cache, work on it, then update main memory as the operation completes. So while operation is still finishing, you do the latency intensive operation of updating main memory while it's still computing to hide the latency under the fact it is still computing.

I just came up with that type of solution and I'm just a lowly computer science guy with a bachelor's degree. There's no way that's the best way, but it would work.

The whining about latency with dGPU reminds me of when people were whining about dual cores and how we'll never use more than one core, it'll be bad to sync data between two cores, etc.

The problem is that some of you look at HSA and the benefits and you decide that the only way to do GPGPU in the future will be with HSA and everything sharing memory 100% of the time. Instead you will see GPU used as traditional GPGPU and HSA with shared memory used as needed. They are two different tools.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


You'll want to rethink the math there. GPUs have only gotten faster by throwing MORE transistors on them. Sure they can do some tweaks with optimizations but largely its by adding transistors.

The problem now is the transistor cost curve is going sideways and slightly up. When GPUs went from 40nm to 28nm the transistors got cheaper and we got faster graphics cards for nearly the same price. Now they're not. With some clever packaging like 2.5D they can continue making larger parts but we'll be paying more for them.

11635d1406145622-sfdsoi2.jpg


Like I've mentioned before you can buy a 20 Billion transistor part from Xilinx but it will cost you 2 arms and 2 legs, making the GTX Titan Z look like child's play. The GPU makers can learn a lot from Xilinx.
 

anxiousinfusion

Distinguished
Jul 1, 2011
1,035
0
19,360


I think that the idea of losing one's ability to mix and match these major components separately scares most enthusiasts. They'll make excuses as to the relevancy of dedicated graphics down to the very end.
 

colinp

Honorable
Jun 27, 2012
217
0
10,680


Exactly my point.

I don't believe that AMD will specifically target a niche market (HEDT) except as a by product of one of their main markets, exactly as in the case with Intel. I also believe that to devote resource to a non-APU line would be a step back for them and a dangerous division of resources.
 


It's a matter of perceived performance and how much brute force your CPU/GPU combo can muster.

In other words, if you take a look at games only, you could say the i7 4770K (or even the old 2700K) is HEDT (or whatever you wanna call it) since the performance is on par or even better than the i7 3970X which is a cheaper Xeon (server class and what not) for consumer market, but we all know the *real* heavy hitter will be the i7 3970X for all intended "heavy" CPU work. For a lot of "pro" workloads, the i7 4770K is a kids tool.

Point is, HEDT will be defined by how much grunt work the APU/CPU in combination with the GPU (if it needs one) can muster. I am perfectly fine with an APU being "HEDT" as long as it delivers within my expectations of being a tough grunt, haha. AMD's HSA is part of that, but I haven't seen much as of late. Hope they're still hard at work :p

Cheers!
 

They used to have both off die cache and ram, now that is much more simplified by having sufficient on die cache and fast enough ram. Memory hierarchy will evolve. There is a reason why ram is dynamic and that we don't load all 50 GB of a game into ram when you play it. Eventually your ram would be large enough and all the software would be smart enough that you would just be streaming in things from the HDD in real time for a personal computer no matter what the software requirements would be. There will be a point where you won't need more ram what so ever.

Now we stack that much ram next to the CPU and what would we need ram on the motherboard for? Mobile SoCs will be coming out in the next year or so without ram in their pcb. Sooner or later its going to happen to laptops and eventually desktops. Its simply going to become less and less useful to have the extra power you get from a desktop PC.

What we have today in desktop PC will go the way of the mainframe to make way to integrated solutions for most people in due time. Hardware is so far ahead of software at this point there isn't even a reason to upgrade for most people running i7-920s for a long time. When you integrate that into a soc with a decent pool of ram?

The moving target doesn't work when software is stagnant for the things the general public would use. Who needs a 400HP muscle car in a city when a hybrid is efficient and will get their job done? Sure there will be people who buy the most powerful stuff but that will become a smaller and smaller niche.
 
Status
Not open for further replies.