CPU Performance Boosted 20% When CPU, GPU Collaborate

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
The way this is written, it makes it sound like a CPU is a memory controller and branch prediction unit. It's way off.

Ideally, you'd have the CPU schedule instructions in such a way they would seamlessly go to the GPU pipelines, much like you'd send a simple integer instruction down the correct pipeline. But, you'd still use the CPUs computational units for things it does better, and there a lot of things they do. Ever wonder why we've don't see GPU designs as CPUs running Operating Systems. They are good at what they do, but very poor at other things they don't do. Using the CPU as a memory fetcher and branch predictor is absurd.
 
AMD claims to have plans for things like this with their APUs in a few years, I think it was 2014. They want to have the graphics and CPU cores totally integrated and the CPU will not need software to tell it what work can be done faster on the GPU because it will be able to figure that out on it's own.

Let's see how far this goes.
 
Lol i read this years ago when AMD and NVIDIA where proposing HUR CUDA AND HUR OPENCL

years after i still cant find a damn usefull thing to do with these
 


If the system will automatically determine what code to run on CPU and what to run on GPU then will I need or not need to program in OpenCL?
 
[citation][nom]warezme[/nom]nothing really, and don't game developers already know this and have been doing this for some time.[/citation]
[citation][nom]greghome[/nom]Is this anything significant? sounds like another Captain Obvious Statement[/citation]

Today's games are pretty much one-way trip, CPU to GPU. What they talk about is "collaboration" between CPU and GPU - true heterogeneous computing, similar to what AMD is aiming with their HSA. The disadvantage of having GPU as an expansion card is the physical distance to CPU. The latency is high if data have to go from CPU to GPU and back. If discrete GPU is only doing the rendering, the one-way nature of it isn't much of a problem.

[citation][nom]alidan[/nom]i dont like the idea of a game running soley on the gpu, look at physx on a lower end card, you have to scale it back to the point it may as well not be there to get the game running at higher framerates.[/citation]

Today's GPUs are still too slow at context switching. The overhead and resource usage are still high when doing rendering and physics on a same card. While using CPU is good, it's still not the best for some highly parallel nature of physics. Using a dedicated GPU is better, but for a GPU it has transistors that are not needed for physics. That's why I think original Ageia's PPU is actually a good idea. It was a chip made solely for the physics.
 
This paper is actually co-authored by Mike Mantor who is a Senior Fellow at AMD leading the Compute Domain Architecture initiatives to drive hardware and software improvements into a new class of APU processors that employ high performance X86 cores and GPU parallel processor cores with a shared memory subsystem. Mike has been a leader in the development of AMD/ATI high performance/low power GPU for the past 12 years. Mike has been a key innovator of the AMD Radeon GPU Shader Core System including enablement of more efficient general purpose compute. Mike has been heavily involved in the development of the Direct Compute and OpenCL APIs.
 
i wonder if the majority of the motherboard will eventually become obsolete,, all the chips on the board are made of silicon right? including the north/south bridge, audio amp chips, ect.. ect..
ssd drives are made of silicon too right? flash memory.
couldn't we produce a processor with the ram, flash memory, processor, gpu, audio, and everything else built in? we would eliminate the bottlenecks of the different interfaces, and save alot of e;ectricity, and effectively make the computer much smaller. could you imagine a motherboard the size of a ipod and just as thick, a small area for the powersupply to connect to, and a few usb ports, and a hdmi.
downside is it wouldn't be upgradeable.. but logically it seems like it will be the final form of the computer.
 
[citation][nom]aidynphoenix[/nom]i wonder if the majority of the motherboard will eventually become obsolete,, all the chips on the board are made of silicon right? including the north/south bridge, audio amp chips, ect.. ect.. ssd drives are made of silicon too right? flash memory. couldn't we produce a processor with the ram, flash memory, processor, gpu, audio, and everything else built in? we would eliminate the bottlenecks of the different interfaces, and save alot of e;ectricity, and effectively make the computer much smaller. could you imagine a motherboard the size of a ipod and just as thick, a small area for the powersupply to connect to, and a few usb ports, and a hdmi. downside is it wouldn't be upgradeable.. but logically it seems like it will be the final form of the computer.[/citation]

It's not that simple. The single chip you want would need to be a very large chip to have everything on it even with modern technology. Also consider that the RAM alone would be a huge part of it, normal machines nowadays have 4GB+ RAM and normal gamnig machines tend to have 6 or 8GB of RAM. At the densest, each chip of each module is usally 256MiB. That would mean you need to have (with 4GB) the equivalent of 16 RAM chips, a CPU, a GPU, a northbridge, a south bridge, and any other integrated hardware all on one chip. Since the RAM would no longer be on separate modules that allow increased surface area for the chips, this solution could be as large as or even larger than current motherboards.

It would undoubtedly be more energy efficient, but having all of these heat producing components so close together would still generate a lot of heat on a small area. You would also have some memory chips pretty far from the processors to fit this all together and that's not a good thing for performance. For this to work we would need to use very expensive 512Mib or 1GiB RAM chips instead of the standard 256MiB chips. That would decrease the amount of surface area needed for this motherboard and reduce the maximum distances between the memory chips/dies and the processor dies to an acceptable level.

The heat problem wouldn't be to bad, no worse than what we have for current high end video cards, but it's still considerable. All in all, this idea is theoretically possible, but it could be more trouble than it's worth. However, it can be done and is done with low end systems that don't need as much hardware or very powerful hardware. We have things like the Raspberry Pi and it's competitor that I can't remember, but they aren't single chip systems. It's definitely possible, but a complete SoC (system on chip) wouldn't be too easy to build and have it perform well for general use.

This will probably be done some time in the future. Computers seem to get smaller and smaller as time goes by.
 
I think AMD has this in mind and is working towards binding them even further. If they can tie them together and use the GPU for FPU ops which it is great at and just leaves basic int. the speeds of their current path of cpu's would speed up quite a lot. Unlike Intel AMD has a better team in ATI to help bind a first rate cpu/gpu hybrid chip. They would still need a seperate gpu card to tie in faster graphic's memory or have to upgrade motherboards to accept it because current on board memory is too slow but that is not a roadblock. In fact if they adapted the faster GDDR memory for both CPU/GPU and with their current or future current build they would have a very fast product once they tied them together and running with Windows 8 they would def. be back in the top end performance segment. I have a feeling they are really working towards this end. One thing AMD is not afraid to do is try new ideas. We owe them a lot for where we are at now because they innovate and try new idea's. Left to Intel we would still be using 32bit chips. They were really bashing AMD about the 64bit when they did it as not needed and were on the huge pipeline mhz race until AMD built a better design. It's only because Intel had more rescorces and money that they were able to basicly take AMD's lead and perfect it.
Kind of like what Japan does with cars and just about everything else. No ground breaking new ideas, just takes current ones and makes them better. It's one reason we still need AMD even more than we need Intel.
We need inovators to continue to push new ideas because this is the only way we progress to better ways of doing things in the long run.
One thing Intel isn't is a long view company. It's hard and risky to do as we see with the first BD build from AMD. It's a radical design that will need more work but is perfect for a CPU/GPU merge.
 
I'm not 100% sure what they're saying they did here. Was it that they're suggesting that the CPU acted as a memory manager for the GPU? I'd be surprised if performance improved through that: if it did, that's a sign of where GPU engineers need to focus development: a 20% boost improvement on the same SPs and cache just by fixing issues with memory access is a huge thing.

What it sounds more likely is that they're speaking of integrating the GPU into the CPU's pipeline, at least virtually. This is something that makes sense: it's originally something that I was hoping would be seen with Llano. Anything that could be an "embarrassingly parallel" problem would be best offloaded to the GPU. While the SSE/AVX unit on current CPUs may be fine for some math, performance would indeed be better if the CPU could simply hand it off to another unit with vastly more power.

To put it into perspective, a single modern x86 CPU core, (including any of Zambezei's EIGHT cores) using an SSE instruction, can execute a single 4-wide 32-bit floating-point math instruction; the most common one used is "multiply-add," so that counts as a grand total of 8 FP instructions per core, per cycle. (4 multiply, followed by 4 add) This makes a 4-core Sandy Bridge top out at 32 ops/cycle, or a theoretical maximum of 108.8 gigaFLOPs for a 3.4 GHz Core i7 2600K. This is comparably VERY small once you put that side-by-side with a GPU, where each SP for an nVidia GPU, or each cluster of 4 SPs on an AMD GPU, can accomplish the same math throughput per cycle as a whole core on an x86 CPU.

Now, in all honesty I actually DON'T believe that having the GPU be separate on a discrete expansion card prevents this from being done; it merely introduces a lot of latency. While this might make some use less ideal, it's still quite possible for CPU tasks to be offloaded to the GPU, if latency isn't a critical requirement. It's quite possible that future architectures will provide us with a vastly lower-latency, more-direct interface between the CPU and the GPU. After all, current integration has put the main memory controller on the CPU die, and all but eliminated the Northbridge chipset.

The more telling thing here, though, is the fact that the original x87 line, before the 80486, was actually implemented as a separate chip on the motherboard, with its own socket, and it managed to work fine there. Granted, it DID simply sit on the same FSB as the CPU, but the physical distance proved to not be an issue. (similarly, cache used to be implemented on separate chips on the motherboard, which worked as well, albeit with higher latency)

[citation][nom]loomis86[/nom]You are completely missing it. This research proves separate GPUs are STUPID. RAM and BIOS will be integrated on a single die someday also.[/citation]
That would be even stupider. Need to replace the BIOS? GL, there goes the CPU as well! There's a reason that the BIOS has been separate from the dawn of the CPU. (the Intel 4001 served this purpose for Intel's 4004)

Ditto for RAM; the stuff needs to be quite variable. That, and by now the amount of silicon needed for a proper supply is huge: implementing all the same components you speak of on a single die would require a massive silicon wafer that would be MORE expensive than the current arrangement. This is because cost goes up exponentially as your die surface area goes up: not only are you getting fewer chips per wafer (due to higher surface area) but the failure rate ALSO goes up: the number of defects per wafer tends to be constant, but 8 defects on a 100-chip wafer is a mere 8% failure rate, while 3 on a 25-chip-wafer is a whopping 32%. (this is a lesson nVidia has learned the hard way again and again)

The same thing here applies to discrete GPUs: putting it on the same die as a CPU is stupid to apply to all applications. While for a tablet or phone this may make perfect sense, if you need high power, you simply can't get enough transistors on a single die. And no, you can't just wait for the next die shrink, because that'll provide more space your competitor is going to use to make a more complex and powerful GPU.

[citation][nom]alyoshka[/nom]Then they made the Sandy Bridge & The Llano which calculated the graphics with the help of a secondary Chip on board.[/citation]
Actually, the GPU portion of Sandy Bridge and Llano ARE on the same die. They are not integrated onto the motherboard, or even on a separate die in the CPU's package.
 
[citation][nom]billcat[/nom]I think AMD has this in mind and is working towards binding them even further. If they can tie them together and use the GPU for FPU ops which it is great at and just leaves basic int. the speeds of their current path of cpu's would speed up quite a lot. Unlike Intel AMD has a better team in ATI to help bind a first rate cpu/gpu hybrid chip. They would still need a seperate gpu card to tie in faster graphic's memory or have to upgrade motherboards to accept it because current on board memory is too slow but that is not a roadblock. In fact if they adapted the faster GDDR memory for both CPU/GPU and with their current or future current build they would have a very fast product once they tied them together and running with Windows 8 they would def. be back in the top end performance segment. I have a feeling they are really working towards this end. One thing AMD is not afraid to do is try new ideas. We owe them a lot for where we are at now because they innovate and try new idea's. Left to Intel we would still be using 32bit chips. They were really bashing AMD about the 64bit when they did it as not needed and were on the huge pipeline mhz race until AMD built a better design. It's only because Intel had more rescorces and money that they were able to basicly take AMD's lead and perfect it. Kind of like what Japan does with cars and just about everything else. No ground breaking new ideas, just takes current ones and makes them better. It's one reason we still need AMD even more than we need Intel. We need inovators to continue to push new ideas because this is the only way we progress to better ways of doing things in the long run. One thing Intel isn't is a long view company. It's hard and risky to do as we see with the first BD build from AMD. It's a radical design that will need more work but is perfect for a CPU/GPU merge.[/citation]

If we didn't have AMD, why would we still have 32 bit chips? Intel had 64 bit chips before AMD did and they're called Itanium. Itanium is a poor performer, but it still exists and is an older common place 64 bit CPU than AMD's first 64 bit consumer processors, the Athlon 64/FX families. AMD was the first to use 64 bit extensions of the x86 architecture, not the first have 64 bit chips intended for mass adoption. For the CPU/GPU hybrid to be first rate, does it need first rate CPU and first rate CPU or is it okay to have first rate GPU functionality and third rate CPU for the hybrid to be called a first rate CPU/GPU hybrid?

AMD does not have first rate CPUs anymore so it's hard to say whom could do this better. AMD makes good graphics, but their CPUs leave much to be desired right now. There's just no way to say that AMD's best consumer CPUs, the Phenom IIs, are first rate CPUs ever since Sandy Bridge came out. With Ivy around the corner and AMD still not having a decent successor to Phenom II (sorry, but Bulldozer is slower than Phenom II in most consumer applications), I'm not sure if AMD can pull out w win any time soon, if ever.

AMD even came out and admitted that they simply can't compete with Intel in performance anymore. Even if software could use 8 cores effectively, AMD would still lose to the i7s by a pretty wide margin, especially the Iv Bridge i7s. They lose by a huge margin to the six core SB-E too, but that's not a fair comparison. As a gamer, there is no denying that as of right now the SB i5s are the best option and the IB i5s will be even better.

AMD's graphics are still going good, but Nvidia will take back the lead with Kepler. made the GTX 480 about as fast as the previous generation's dual GPU card, the GTX 295. It stands to reason that Nvidia will do something similar, if not going well past the GTX 590 with their next single GPU card, presumably the GTX 680. Once this happens, AMD will need to drop their prices and try to compete more with value than raw performance. However, I think despite this that AMD will have a much better solution in the graphics market than the CPU market. Here AMD doesn't need to compete with raw performance, they jsut need to compete with performance per watt and per amount of currency (US dollars for me).

If I had to buy graphics cards right now, I think I'd go for a 6870. Nvidia has shown that they either refuse to or are unable to compete with AMD outside of the high end market. The slowest Nvidia card I consider worth buying from the current generation is the 560 Ti because the 560 uses about the same amount of power for considerably less performance, making the Radeon 6870 a much more attractive option and I don't think I need to explain why not to buy a GTX 550/550 Ti. AMD does well in graphics, I'll call their graphics first rate. However, AMD's CPUs are not first rate anymore and no amount of sugar coating can change this fact. Perhaps AMD will fix the problems with Bulldozer, maybe they will abandon it like Intel abandoned Netburst.
 
With Lano now capable of gaming BF3 in ultra ,how long before HSA allows ulv to do the same
especially when 21% is what current designs can milk from HSA.
 
[citation][nom]triny[/nom]With Lano now capable of gaming BF3 in ultra ,how long before HSA allows ulv to do the sameespecially when 21% is what current designs can milk from HSA.[/citation]

Good luck gaming with a sub-par CPU that has graphics roughly equal to a Radeon 5550, somewhat slower than even the 6570. Another 20% or so wouldn't even catch up with the 6670, let alone any mid-range graphics cards. A Sandy Bridge Celeron or Pentium paired with a Radeon 6570/6670/6750 will offer far better performance for the same amount (or even less) of money as any Llano processor.

Face it, AMD failed almost universally on the CPU side unless you count mobile CPUs and places where highly multi-threaded work is done without the need for something like an i7 or Xeon. For desktops, AMD only wins in low and mid end highly threaded work. For laptops, AMD only wins in low end and middle end systems. For net-books, well AMD pretty much wins all around here, but net-books aren't a great market anymore and really are losing interest.
 
I forgot to mention this in my earlier post, but the A8 graphics of the top Llano APUs is equal to a Radeon 5550 (it is a modified Radeon 5550 or so anyway), not enough for gaming on ultra on anything even remotely graphics heavy. It is significantly slower than even the 6570 and that is not a card good enough for common gaming.

A8s have the best IGP, but it is not enough for serious gaming and can struggle with even minimum settings and resolutions in some of the modern games.
 
I forgot this again, but there is no way that an A8 can do ultra in BF3 even at minimum resolutions with playable frame rates. Not gonna happen. Not even remotely playable frame rates. Remember, it is a Radeon 5550, not a 5750 or something like that, not even close to a 6670. The 5550/6550D is probably around half of a 6670 and that is the entry level graphics card.
 
[citation][nom]tvtbtdra[/nom]they will be fast enough to do damn newer everything for everyone, and the only people who want more will need a specialty item (probably wont be overly price inflated, due to size of the chips at the time)[/citation]

An A8 can't even go beyond 1024x768 in Metro 2033 and BF3, not much higher in the other modern games except for Star Craft 2 which is a very light game compared to the others. Definitely not enough for even most gamers, especially most gamers on this website. Most of us use something faster than the Radeon 5550 which is about identical to the A8's 6550D.

If a Radeon 6670 or better is a "specialty item" to you then you are, at best, a casual gamer, and the opinion expressed in your comment has no weight for the vast majority of us. I am a casual gamer at best, but I am very knowledgeable about serious gaming and I can tell you that most video cards are not specialty items.

These A8s are fast enough for most non-gaming workloads or light gaming workloads like most people do, but so are the HD 2000 and HD 3000 IGPs from Intel's CPUs. Even the integrated crap on AMD's AM2+/3/3+ motherboards is good enough for regular work and it's even weaker than Intel's HD 2000.

Even the absolute garbage GMA 950 from my old Intel Pentium-Dual Core (a cut down first generation Core 2 Duo) is good enough for regular work and watching movies, although it might have problems with 1080p and and probably can't do 3D 1080p. However, the other graphics I listed most certainly can do 1080p and Intel's HD IGPs can do 3d 1080p.

Don't think for a second that any of this stuff can do 1080p in gaming, because there isn't a chance of it, and 3D 1080p is twice as intensive as regular 1080p so not even an A8 can do it for Star Craft 2 at even 15FPS, let alone a playable frame rate. I've heard that an A8 can get about 28FPS in 1080p with lowered settings in Star Craft 2, but don't think that it can do it at decent settings nor even that good in any other recent game.
 
Status
Not open for further replies.