AMD CPU speculation... and expert conjecture

Page 249 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.




+1 I am helping someone move from an old Gateway P55 motherboard and a 860 (Everything else is new, GTX 670, TX650, HAF X case, etc) to either Maximus P67 or Sabertooth Z77 and a 2600K instead of telling him to go 4770K + Z87 and burn extr4 cash that he is using to get a NH-D14. You need to paint.net yourself a Phenom III X5 :3
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Oxymoron! Superpi is a 14 year old outdated x87 benchmark. It has ZERO relevance for a modern operating system or program.

"SuperPI mainly uses legacy x87 instructions which have been almost completely superceded. On top of that it has no real world use or purpose as there are newer programs which can calculate PI almost 100 times faster. "
 
Curiously I find it funny that Tom's moderation staff has allowed him to acquire so many posts while exibiting this sort of behavior. I wouldn't be surprised if Tom's saw some sort of benefit from allowing people like hajifur to run around and slide forums into disarray.

We've been watching. Hajifur hasn't actually broken rules, yet. He (assuming male) is entitled to express his opinion, provided insults and flames don't get slung and it's at least somewhat on topic. My suggestion is people stop replying / engaging if you don't wish to discuss something with him.
 

8350rocks

Distinguished
http://www.advancedsubstratenews.com/2013/07/globalfoundries-on-cost-vs-performance-for-fd-soi-bulk-and-finfet/

Interesting read on GF's projections for FD-SOI and bulk FinFET.

It appears, based on their internal testing, that 20nm FD-SOI will be cheaper than 14nm bulk FinFET and perform equally.

If these extrapolations from their data are accurate, that means Intel will lose process advantage entirely in 2015 when 20nm FD-SOI goes live as they launch their 14nm bulk FinFET.

I think I feel a change in the wind coming...

EDIT: IBM's projections for SSOI near the bottom of the page are quite interesting as well...SSOI is a greater process advantage over FD-SOI, than FD-SOI is over bulk...
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Using known data disclosed by AMD I obtain that 4-core Steamroller have a performance at level of a Sandy Bridge i5 CPU. In some tests AMD will be slower and in others faster.

With HSA software, Kaveri A10 will be much faster than an i7-4770k. Developers are claiming a 500% boost in performance.

The TDPs of Kaveri family is similar to Richland family. E.g. the top APU has 100W, but note that this is including the powerful new iGPU ~ Radeon 7750.

I am preparing an article about that and will include graphics with estimated benchmarks.
 
Could HSA be the reason AMD may not make a 8 core AM3+ steamroller, if its as good as it could be you may not want to waste space on CPU cores when GPU cores will make more difference. Just a thought.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


That is my point. Why would AMD waste die space, TDP, and money with extra SR cores for parallel workloads, when GCN CUs are much much faster at parallel workloads?

[strike]8 SR cores[/strike] --> 4 SR cores + 8 GCN-CU cores

APUs maintain the same "moar cores" motto, except that cores of one kind are being substituted by cores of a more efficient kind.

This is High Performance Computing design coming to the desk. Curious readers can take a look to modern supercomputers such as Titan and see that the supercomputer uses GPUs (not CPUs) for the most heavy parallel tasks

http://www.olcf.ornl.gov/titan/

 


That's not how that works, not remotely. Stop using "cores" as a generic reference to a CPU. Modern CPU's are composed of several processors that each do a separate function, their aggregate capabilities are then lumped together in what we call "core". Perfect example is x87, SSE and other SIMD units. Those are not part of x86, their referenced and addressed separately, have separate non-x86 registers and in the case of SIMD are actual special purpose RISC processors. "HSA" is just a term AMD (and others) is using to reference putting multiple processors of different uArch's on the same system. By exposing the ports / irq's (yes we're getting to this level) / address's of those different CPU's to the OS we can then run code as required. Code compiled for one uArch will not run on another, so there is no magical statement like what you referenced above.

GPUs are just special purpose CPU's, they have registers and run instructions like anything else. Their uArch and ISA is designed around doing the kinds of processing that is involved with rasterization, geometry and lighting effects yet their still processors. You can get a GPU to add 2 and 2 together and produce 4, will be slow and wasteful compared to using an integer processor but it will work. There is currently a ton of inefficiencies when working with these special purpose coprocessors, moving the data in and out of main memory, crunching it and then deciding what to do with the result.

HSA is about reducing / removing those inefficiencies, and streamlining the process of running code on an integer processor (what a CPU is) while also running code on a vector processor (what a GPU is). Basically making GPGPU / OpenCL type languages much faster and having less layers of abstraction / inefficiencies.
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780


Thanks for the link, this is the kind of news i am interested in.
 

cowboy44mag

Guest
Jan 24, 2013
315
0
10,810


I'm watching the HSA development very closely myself. I go back and forth every day, should I upgrade to FX-8350 or see what Kaveri APUs have to offer first. If Kaveri APUs look good should I jump on that or wait a little longer to see if Steamroller will have a CPU AM3+ processor.

As much as I like AMD I'm not buying into HSA 500% improvement until the actual real benchmarks come out. I'm sure it will perform better, but 500% better just seems too good to be true. However IF it is true than it it going to be AMD's silver lining.

Honestly right now I'm sticking with my Phenom II 965 BE. I love the processor and it has been a beast for over four years:D

Personally I think AMD would do well to scrap the FX name which people still associate too closely with BD and release 4, 6, and 8 core Steamroller AM3+ processors and call the line Phenom III:D A lot of AMD faithful would buy into the name alone as it brings back fond memories, and if it performs well it would be a total home-run for AMD. I personally think they would sell more with Phenom III name than FX name.
 
"HSA" won't change anything unless a program is designed to use it. I can't compile 7zip for SPARCv9 and expect it to run on an Haswell/Piledriver CPU. A program compiled for x86 without special "HSA" related code will not utilize any additional processor units associated with that code. I'm talking from a bare metal point of view here, operand A running on unit B. Typically in the cases of OpenCL/GPGPU what happens is a form of pseudo code is sent to the driver layer that does a form of dynamic-recompilation for the specific hardware installed (the binary language a 7970 speaks won't be the same as a 680 nor even something like a 6770). Because there is an additional layer of abstraction it makes communication between the main processor and the vector coprocessor much harder. If we can get both of those processors talking in the same unified memory space along with some standardized language, then we can discard the need for abstraction and start to code for bare metal (well compile for bare metal).
 

etayorius

Honorable
Jan 17, 2013
331
1
10,780


Totally agree.
 

mlscrow

Distinguished
Oct 15, 2010
71
0
18,640
AMD is competitive with Intel now given price points and they will be even more competitive with Intel in the future. Intel doesn't have a 2.1x lead in IPC over AMD, so for someone who doesn't like misinformation, it's pretty embarrassing for Hajifur (I don't care how his stupid name is spelled) to throw misinformation out there. Hajidur, every single person in this thread knows that you're a complete troll and that the only thing you bring to the table to support yourself is the fact that Intel has better performance per watt, which is a completely worthless win in the desktop enthusiast market. So please, for the sake of everyone on these forums, stop with your annoying and repetitive posts. Intel is better in performance per watt, but guess what, this thread isn't about Intel or performance per watt, it's about Steamroller so unless you have something to contribute to Steamroller speculation without making comparisons to Intel, without mentioning Intel, without bringing up performance per watt, then please just stop posting here and bringing the subject off topic and creating this debate in the wrong thread. If you want to discuss how much you love Intel, create a new thread, but hijacking current threads meant for other discussions is getting old really fast. Plus, you've already completed making your impression, which is that of a moron with a hard-on for Intel.

Now, back to Steamroller. Some people have stated that the 3-module design that was supposedly axed was more than a mere rumor. What makes you think so? Is there any evidence at all (outside of aged roadmaps pre-3 module cancellation statements) that suggest there will indeed be a 3 module Kaveri? I was holding out on news of an 8-core steamroller, but if Kaveri comes out in a 3M/6T version, then I'm probably just going to pick one of those up, but I have my doubts that it will ever come to fruition. If anyone can speak to this, please do so.
 
Haswell isn't even remotely doing 210% instructions per clock. You don't even know the real meaning of that term (it's not what is used on review sites) much less how to apply it.

Instructions per clock is actually meaningless on an x86 uArch as different instructions have different processing times / transaction latencies. Doing 10,000 ADD's will produce a radically different time then doing 10,000 DIV's and that's without getting into addressing and memory manipulation.

Here is a hint, practically no instruction in x86 takes 1 cycle to complete and thus no system can get more then 1 instruction per clock cycle. A 3Ghz CPU has literally 3,000,000,000 cycles per second and no unit can process anywhere near that many instructions in a single second. We get around this limitation by running multiple processing units in parallel, both internally and externally. Inside a single SB "core" you have three separate integer units, a single PD "core" has two integer units. Things aren't much different in SIMD work, and ancient 1980's 8087 (that's superpi BTW) is absolutely horrid. Anyone using 8087 metrics for purchasing decisions either needs to serious invest money into modernizing their line of business software or is trolling other people for emotional candy.
 

cowboy44mag

Guest
Jan 24, 2013
315
0
10,810
No one who knows anything about modern computers and software uses Superpi as a benchmark for anything. Correction, no one except people who don't know any better, people who haven't kept up with the technology since the late 1980s, and Intel fanboys use Superpi. It is a total worthless benchmark, but one that Intel fanboys love to use because Intel processors score high with it. In actual real raw performance we have seen MODERN benchmarks that put the FX-8350 above i5 and in some cases above i7 4770K. In those modern benchmarks the only processors that beat the FX-8350 are 6 core Intel that cost five times as much.

That is what Intel and their fanboys don't want anyone to know. They don't want people to know that in multi-threaded modern benchmarks Intel's "huge" lead doesn't exist. They don't want people to know that a $200 processor can best the i7 4770K in those benchmarks. If Intel were truly "superior" and were truly worth their bloated price then there wouldn't be any benchmark going where the FX-8350 (AMD's last generation flagship) could beat the i7 4770K (Intel's current generation flagship).
That is why AAA gaming studios are backing FX-8350 for future games and why Intel is thinking about releasing an 8 core Haswell. If you want to build a system for playing current (for about the next 6 months) and last generation games by all means Intel is the way to go. For double the investment you will have a computer that gets ~ 10 FPS better than AMD. If you want to build a system for future games that will be made for PS4 and Xbone then FX-8350 is the best rated option right now unless you have so much money you have no idea what to do with it next and opt for insanely priced Intel 6 core. Hopefully Steamroller AM3+ will be announced soon and that would be an even better option. In the most important metric there is PRICE/PERFORMANCE AMD has been and is the clear choice.
 

8350rocks

Distinguished


To further elaborate here...IPC is an engineered limit on the CPU...the theoretical maximums are never actually reached, because in a perfect world, things would be coded so efficiently, it might be possible. However, because there are a HUGE number of variables in the time it takes to process instructions..."IPC" as review sites use it will vary wildly across any given number of programs.

The reality of the situation is, Instructions Per Clock is a completely different animal than 90% of the people who read review sites think it is. I see that term misused more than nearly anything else across the internet.

What happened is, somewhere, some reviewer, who thought he had half an idea about engineering, used a technical reference term on a review. Other sites thought it sounded intelligent, and also began misusing the same term consistently. Because of this gaffe, the term IPC is now misused in reference to anything that improves performance of a CPU.

Reality is, you don't necessarily get IPC improvements from 1 generation to the next, internal pipeline tweaks and other things may make the CPU faster, but the actual IPC that the CPU is capable of running at once does not change.

For example: The 3770k and 4770k have the exact same theoretical maximum IPC engineered into them. People think the 4770k has IPC improvements in the hardware because review sites tell them it does. Reality is that Intel tweaked the process on a very small scale generation over generation to get a very slight performance increase. IPC for a quad core Intel CPU, of any kind, has not changed since Nehalem, however, performance increases by process tweaks have actually improved the speed at which the instructions are processed, though the theoretical maximums are never achieved at all.

Now, hafidurp, stop misusing the term IPC.
 


True. But you can compare two arches, correct for clock speed and cores [assuming 100% full load; math gets hard if this doesn't hold], and, for a given benchmark, figure out the relative IPC difference between them. Given enough benchmarks, you can "guesstimate" the relative IPC differences of the architectures.

EDIT:

FYI, again, I hate these new forums. I quoted someone, left the forum, came back and quoted someone else, and the first quote was still added to my message. Seriously?
 

8350rocks

Distinguished
"Guesstimating" is about as close as anyone can ever get, even then, you have statistical outliers that skew results because one architecture has it as a strength and another does not.

For example, if you compare games like Crysis 3 and Battlefield 3 (multiplayer especially)...you find that AMD and Intel are very close in performance. In other games, they are virtually the same. Then you have poorly optimized games like Planetside 2 or Skyrim that completely skew the results because something in the way they were coded is very well received by one architecture, but not the other even though AMD supports more instruction sets than Intel on their current generation CPUs.
 

8350rocks

Distinguished


The only deluded person in this thread is the person posting nonsense about Intel...you're right, you shouldn't have even bothered, we'd all be happier.
 

8350rocks

Distinguished


Yes, the HD 9970 is expected to be bundled with BF4 when it releases as part of the Never Settle bundle.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


In the first place, I have absolutely no idea of from where you got the idea that I am using "cores" as a generic reference to a CPU, because I am not.

In the second place, I would like to know from where you got the idea behind your "Code compiled for one uArch will not run on another, so there is no magical statement like what you referenced above" because nowhere I said that code complied for an uarch will work for another. At contrary, I have been giving just the contrary idea in this thread, e.g. when discussing "bvder2" compiler flags for Piledriver.

You consider a GPU a special kind of CPU, but others don't. It is customary to differentiate between CPU and GPU. It is not strange, therefore, that the people who designed the Titan supercomputer makes this distinction as well. See above link:

The combination of CPUs and GPUs will allow Titan and future systems to overcome power and space limitations inherent in previous generations of high-performance computers.

Because they handle hundreds of calculations simultaneously, GPUs can go through many more than CPUs in a given time. Yet they draw only modestly more electricity. By relying on its 299,008 CPU cores to guide simulations and allowing its Tesla K20 GPUs, which are based on NVIDIA's next-generation Kepler architecture to do the heavy lifting, Titan will be approximately ten times more powerful than its predecessor, Jaguar, while occupying the same space and drawing essentially the same level of power.

My main point is that AMD is translating this hybrid architecture used in top supercomputers to the server/desktop/mobile space with HSA APUs

amd_kaveri_apu_huma-580x302.jpg


Note as AMD also differentiates between CPU and GPU.

HSA is more than "basically making GPGPU / OpenCL type languages much faster". HSA blurs the distinction between different kind of cores by providing an unified approach both at hardware and software level. In principle, developers will select the more optimal cores for each kind of workload: serial --> CPU cores; parallel --> GPU cores. But there is an extra possibility opened by HSA to use the GPU to help the CPU in serial tasks (e.g., when the CPU was bottlenecked) and vice versa.
 

cowboy44mag

Guest
Jan 24, 2013
315
0
10,810


Finally you got something in all you BS ramblings right. AMD will be releasing Hawaii, and guess what nVidia has nothing to answer it. Wasn't it some ignorant Intel troll who several pages back said "AMD can't compete against Intel for processors and AMD can't compete against nVidia for GPUs"? Now who was it that said that?

Seems like AMD can compete against and crush nVidia, Hawaii proves that. Now its time for Intel to get their wake up call..... Steamroller may not crush Intel, but its going to come a lot closer than any Intel fanboy or Intel themselves is expecting, and Intel is in "lazy" mode right now.
 

8350rocks

Distinguished


The HD 9970 is projected to be about 90-95% of the performance of the GTX Titan with approximately 80% of the comparable power draw...

Nvidia was better at what again...?
 

jdwii

Splendid




One thing i like about the GPU market is its extremely competitive and this forces Amd and Nvidia to come out with great products but usually at different times, although i prefer Amd video cards i admit both companies are competitive.
 
Status
Not open for further replies.