AMD CPU speculation... and expert conjecture

Page 574 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

colinp

Honorable
Jun 27, 2012
217
0
10,680


True, as will you of course. The difference between us is that my genes will get passed on to future generations.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780
Juan, are you going to address what I'm talking about or are you just going to incite personal attacks on me?

I was asking what benefits a system with many APUs have over one with many dCPU and dGPU that can all use HSA. You then basically went "GPU in APU is much more efficient than Hawaii so all dGPUs are inefficient". And then I showed you a GPU around the size of what you find in an APU is actually significantly more efficient than the biggest dGPU AMD makes (hawaii).

You will not accept the fact that a many APU system has the same problems that a dCPU and dGPU system have. So you show all this work to get APUs working very well because the problem of latency and inter-chip communication is removed by putting CPU and GPU on the same die. And then you tell me that using multiple APUs will not have the same problems as multiple dGPU and dCPU configurations. You also won't accept that if/when AMD gets HSA working across multiple APUs that you can do the same with dCPUs or dGPUs.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
I find very interesting that Linus' claim is receiving the same misguided comments and the same fallacies that I am receiving here. His responses and those from others are worth to quote:











Link to thread given before.
 
You can have CPU and GPU manipulating cooperatively the same data at once. It is one of the advantages of HSA. This is why AMD claims that HSA APU is more than a CPU and a GPU.

You can not have two processing devices operating on the same piece of memory at the same time, otherwise one will overwrite the second and cause a memory access violation somewhere down the line. You need to use locking to stop one device if that piece of memory is in use, which obviously KILLS throughput. That's why threading generally isn't done if there's a non-trivial chance of this occurring; if threads need to process the same data at the same time, then threads probably aren't the right solution.

The only way you can get around this limitation is to use some form of Transactional Memory, which essentially checks the contents of memory before saving the result, and checking to see if the original data has been overridden. If not, then you can save the data since no other thread has overridden you, if not, then you have to dump the result, put a traditional lock in place, and do the processing a SECOND time, which obviously kills performance. That's why, you may recall, I argued that Transactional Memory was more for servers then Desktops, when we had this discussion last year.

Putting the GPU on the CPU die only gets around PCIe Bandwidth limitations, which isn't a performance limiter right now for *most* tasks, though if you have a process that gets data in small chunks, then yes, PCI-E does become a bottleneck (though I argue you can fix this by sending all the data you need across the bus once, so I argue inefficient coding in this case).

Putting the GPU in the same memory space as the CPU is OK; it does make programming a little simpler, but you still can't have both devices operate on the same data at the same time, so you don't gain anything performance wise by virtue of this change.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
@de5_Roy, I gave you two links explaining HSA, what is and how it works. GDDR5 vs DDR3 was also broadly covered in this thread during days.

@gamerk316, putting the GPU on the CPU die to avoid PCIe Bandwidth limitations is something that was achieved before HSA. Having a unified memory poll is something available in console APUs, but those are not fully HSA complaint. Nvidia has unified (virtual) memory now but their approach is not even close to HSA hardware

unified_memory.png


I think you simply don't understand what HSA is or it is simply your traditional dislike of anything AMD related.
 

yes, those links did provide explanations to hsa. but none to your claim and your claim of what amd claimed.

you also failed to provide explanations to your own post: how do

work? it's not ddr3 vs gddr5. it's the content of that post.
you've already failed to explain gddr5's advantage over ddr3 outside it's application specific usage in consoles, but that's long done.

edit: parroting marketing fluff doesn't count as explanation.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
@de5_Roy, I explained the advantages of GDDR5 over DDR3 outside of consoles: concretely I discussed high-end dGPUs and the canceled 3M Kaveri version for desktop that used SODIMM modules. This proves once again that it is a waste of time to try to educate your on something :-D
 
AMD Clears the Air Around Project FreeSync
http://www.tomshardware.com/news/amd-project-freesync-vesa-adaptive-sync,27160.html


really pathetic attempt to start a side-argument. you explicitly failed at that one after others provided credible and descriptive explanation to how each tech work while you posted this:
among numerous others.


i am still waiting for the explanation of how your proposed hardware conflict is actually an hsa advantage, apparently claimed by amd(as claimed by you) no less.
 
Ok problem solved. Everyone cease arguing with Juan as this has gotten out of hand. There are several knowledgeable people, professionals in their specific fields, who have pointed out various technical hurdles and pro / cons. Something we've all learned, some the hard way, is that there is no free lunch, nothing is free. Everything is a trade off, your giving up something to get something else. Understanding those trade offs is key to understanding the progress of technology.

For the whole graphics debate. People nothing stands still. "1080p gaming" won't be enough in five to six years, we'll have moved on. And more importantly, there are several veins of technology being developed that have the potential to disrupt the market and radically alter expectations and standards. Things like HMD based VR, augmented reality, kinematics computing (using hand gestures / ect..) and even Human Machine Interfaces. Each of those could radically alter the way we interact and experience computing devices, that in turn would change the performance profiles and render and attempt at future-casting an exercise in futility.

There is a physical limit to the amount of transistors you can have on any particular piece of silicon. There is also a limit to the amount of thermal energy you can safely transport away from that silicon without getting into exotic solutions (for consumers). Furthermore the chances for defects goes up exponentially with die size, this is what makes large die sizes extremely inefficient / expensive. A 650mm^2 300W combined chip is simply uneconomical in the home consumer market. That is the real limit to combined SoC's. You can have a 200mm^2 CPU with a big ~550mm^2 GPU and it would be cheaper and easier to cool then making a single 750mm^2 chip. And raw chip computational power is a product of the number of processing elements on the chip which in the case of SIMD array processors is directly related to the size of that chip.
 
Testing AMD’s Mantle: Battlefield 4, Thief and PvZ Garden Warfare
http://www.eteknix.com/testing-amds-mantle-battlefield-4-thief-and-pvz-garden-warfare/

the thief benchmark stands out as it's the only game in the roundup that offered built-in benchmark tool (easy for reviewers) and the one where fx8350 makes massive gain over fx8350(dx11) as well as fx4100(mantle). the fx8350 gains as much as 3 times minimum fps over fx4100(dx11). wow. strangely fx4100 still seems to bottleneck in thief with mantle enabled.
i woulda liked if they included intel cpus as well, especially pentium a.e. and d.c. i5 and i7 cpus.
edit: okay that sounded a bit sensationalistic. :p but fx8350 gained nearly 2x minimum fps over fx8350(dx11), 58.50 fps from 29.90 fps.

edit3:
pclab.pl has tested mantle and cpus (both intel and amd, stock and oc'ed, this time) on pvz: garden warfare
http://pclab.pl/art58492.html
kaveri 7850k gains more fps with mantle as long as gfx load is lightened. mantle improves avg. fps on almost all cpus. i didn't read the text on how they tested mantle performance.
 

vmN

Honorable
Oct 27, 2013
1,666
0
12,160
The 20% bottleneck should only occur only heavier workflows where the frontend is essentially starving the backends.
Also windows have provided a "hotfix" for the old problem, and should now threat is as a quad core.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


Actually, problem was that Windows TREATED 1 module as 2 normal cores. Now it's of course still visible as 2 cores in taskmgr, but second core won't be loaded as long as it's not necessary (like you said, in heavy workflow). Similar patch was created fot Intel's HT at its begining.
 


Again, the problem was always the ~20% hit on using the second core of a BD module. So either you accept that penalty, or avoid using the second core, at the cost of not being able to use Turbo Boost as much. Its a tradeoff due to the module design.
 


The best part: CPU wise, Intel comes out the big winner:

plants_1920_r9dx_cpu.png


plants_1920_r9mantle_cpu.png

Pentium G3240 jumps ahead of the FX-6350 OCd to 4.7. That's embarrassing.

And for comparison, DX11 with the 780 TI:

plants_1920_780ghz_cpu.png

Or about the same as the 290x with Mantle. Oddly, the lower tier chips do better on the NVIDIA/DX11, which I wouldn't expect quite honestly. In fact, the results are the opposite of what we saw in BF4; the better performing CPUs saw more of a boost then the lower tier ones. Might be because PvZ is more CPU bound?
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


It's FX-4100 (first Bulldozer).



Simple. Because Nvidia does the same.
 

szatkus

Honorable
Jul 9, 2013
382
0
10,780


I think that could help a lot with feeding GCN. My blind shot: +30%?

BTW, similar news was linked earlier.



Nothing surprising. I've seen some AMD fanboys (or rather ignorant fans) which believed that Mantle will give any boost only with AMD's CPUs ("designed for modules" or something like that), but in fact Mantle doesn't (and shouldn't) favor any CPU vendor.
 

shouldn't that make programming a bit harder? without hsa (majority case for now), you have L2, stacked dram, system memory. with hsa, you have all of the memory under same address space. the dram cache is outside uncore too, to me it looks like a high latency L3 cache. (edit1: so far, the apus haven't had L3 cache, only up to L2)

i'm gonna look into pclab's text later. their results are opposite of amd's promo slides where an fx8350 was shown outperforming core i7 4770k(iirc). another thing, pvz:gw's frostbite 3 shows cpu scaling even with mantle and the avg. fps is under the engine's cap of 200fps, so there may still be some performance left, imo. may be the driver is inhibiting some of the performance.


they do. and those gpus are prohibitively expensive and they constantly blame tsmc. i think amd is so close to nvidia in terms of performance now, they intend to keep pressing on.
 
I think that could help a lot with feeding GCN. My blind shot: +30%?

I doubt it for two reasons. First off, traversal times. L3 is already very inefficient, in terms of space versus performance, and L4 will be even more so. Secondly, it simply isn't big enough to really benefit a GPU. It will help, don't get me wrong, but only about 5-10% or so. And if it takes a significant portion of the die, one needs to wonder if the space could be better utilized by something else (more GPU cores, more CPU cores, specialized logic processors, etc). I do expect the APU as a whole to be ~20-25% faster on average though, simply due to generational gains.
 
shouldn't that make programming a bit harder? without hsa (majority case for now), you have L2, stacked dram, system memory. with hsa, you have all of the memory under same address space. the dram cache is outside uncore too, to me it looks like a high latency L3 cache.

That's all invisible though. From my perspective, I don't care where the data goes, be it RAM, L1/L2/L3/L4, or whatever. That's managed by the CPU/OS. Where exactly the data is located at any point in time is invisible to me. [That being said, you could certainly optimize for a specific architecture by taking these caches into account, but for most programs, you simply don't code that way.]

i'm gonna look into pclab's text later. their results are opposite of amd's promo slides where an fx8350 was shown outperforming core i7 4770k(iirc).

"AMD Promo Slides"

another thing, pvz:gw's frostbite 3 shows cpu scaling even with mantle and the avg. fps is under the engine's cap of 200fps, so there may still be some performance left, imo. may be the driver is inhibiting some of the performance.

Possible, except FPS is capping at about 170 or so, so I don't think the results are being suppressed by the FPS cap.
 
Status
Not open for further replies.