AMD CPU speculation... and expert conjecture

Page 237 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

GOM3RPLY3R

Honorable
Mar 16, 2013
658
0
11,010


Also notice that core count and clock speed for a set processor are missing from this statement you have here. For example: the FX_9590. You could say that it has power that is ~ an i7-3770k/4770k. In that range. Except, the V-core on the 9590 is Much higher. Even though your's can handle a higher v-core (which I agree is good), take into consideration how much work output there is, and how much of a clock increase there is over the original clock.

Just to run this by, someone said that at 4.5Ghz for a 3930k, you need no less than 1.4 V-core probably ~ 1.45. If you had it set at that that, your processor would be in the grave yard before you could use it. You wouldn't need 1.4 for that processor until you hit ~ 6.0 Ghz and you're using Liquid Nitrogen or something.
 

GOM3RPLY3R

Honorable
Mar 16, 2013
658
0
11,010


I corrected myself, now shut the hell up. Stop trolling.

If you're so sure about it, then give me test results. Under the consideration of stock performance and adaptability, a fair "consumer plug and play" test would probably be a Hyper 212 Evo, and running everything plug and play, no adjustments. Then, you'll be fooled. Idle it may run hotter for Intel, but just watch the sparks fly.
 

mayankleoboy1

Distinguished
Aug 11, 2010
2,497
0
19,810


When did this happen ? Has AMD developed something new that automagically converts single threaded code to multithreaded ? Maybe they could make it public and open source it, unlike the greedy Intel bastards.

BTW, you haven't done much real world coding, have you ?
 


I have done coding lately. What about you, mayan?

Intel is trying hard to give Devs tools to easily thread their code and guess what... AMD has been doing it as well. Even more, they have a foundation to help in that regard: HSA.

Also, please stop posting offensively. Instead of debating like monkeys, try to bring the conversation to human terms.

Coding solutions in a threaded manner depend BIG TIME on the frameworks and/or lang you're using. I specialize in DB and Webservers/Backbones for a big variety of applications. You can see threading being done in a LOT of stuff, but people just choose single threading (lineal) solutions because of it being easier to code and maintain. Time critical solutions MUST be threaded in my book (if they can be), so it's a matter of what the person behind the keyboard can do within the time constraints of the solution asked.

If you don't have time to code, then forget about threading with current tools. Even current frameworks and environments have lackluster tools for that. You really need time to re-think and apply a beautiful solution to speed up things when hardware allows.

So, even if Intel or AMD have all those magical tools ready, at the end of day, they must leverage the fact that developers/programmers have to get familiar with threading, and most monkey coders don't really know how to approach a problem in that perspective effectively, so for "mass programming", they better have rainbow spitting tools, because most companies won't want to shell out the greens for adapting code. Luckily new projects and with the own developer's effort in learning that new tool (if one exists).

/rant

Cheers!
 

mayankleoboy1

Distinguished
Aug 11, 2010
2,497
0
19,810

You forgot one important thing : threading depends on the nature of the problem itself.
I have done some DB code and lots of Java code. And we do extensive threading. But, that threading is all happening on the same core. Its like each thread fetches a piece of data. So if one of the thread is blocked, the other thread can work. But i cant make two threads to work on different cores at the same time. So its not SMT, but just threading.
I think when any of us talk about 'threading' , we mean SMT.

One of the methods used for parallelisation of selected workloads is to make one thread the 'master' thread and the other threads as the 'slave' threads. The master thread sends chunks of data to the other threads. ( Note that this is used only when the calculations can be split into smaller parts, without mangling the logic) Even here, the code is bottlenecked by the speed of that single thread, which means that the power of a single core is of the utmost importance. I am amazed by people who consider single core speed as irrelevant.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


I agree there really isn't a way to completely escape single core performance. What x86 should take from ARM world is the big.LITTLE idea. Have a couple uber cores (higher IPC, more transistors) with much higher TDP limits (Turbos) and put them on opposite sides of the die for heat distribution. Then add several more worker cores (lower IPC and fewer transistors) with lower TDP limits.

Of course if AMD or Intel do that it would be years for Microsoft or open source OSs to properly take advantage of it. That's a dilemma ARM is facing with their big.LITTLE cores as well. Clever hardware requires even more clever software.

 
Coding solutions in a threaded manner depend BIG TIME on the frameworks and/or lang you're using. I specialize in DB and Webservers/Backbones for a big variety of applications. You can see threading being done in a LOT of stuff, but people just choose single threading (lineal) solutions because of it being easier to code and maintain. Time critical solutions MUST be threaded in my book (if they can be), so it's a matter of what the person behind the keyboard can do within the time constraints of the solution asked.

If you don't have time to code, then forget about threading with current tools. Even current frameworks and environments have lackluster tools for that. You really need time to re-think and apply a beautiful solution to speed up things when hardware allows.

To truly make something parallel people need to first redefine the problem into something that can be done in parallel. This requires recreating the logic and methods behind the algorithms being used. I see this all the time, coder's create an algorithm or method that is intrinsically serial in nature then complain that it's not easy to do in parallel. Well if it was originally designed to be serial it'll have natural bottlenecks created. Often for a method or particular algorithm to be made parallel friendly it will need to sacrifice some efficiency in serial work. Meaning if method A requires 10 steps to complete but is restricted to a single method running at once and method B requires 15 steps yet can be run many times in tandem, the second method is preferred. You see that thinking A LOT in HPC workloads. Deliberately choosing ways to do stuff that may be less efficient in "ST" scenarios yet easily scales upwards.

You can see this thinking in games that have a single large thread that does damn near everything and simply use's occasional worker threads to speed a few things up. There is no logical reason that a single thread should be running the entire damn game. There are so many different things going on simultaneously that things should be working independently of each other. It will consume more memory and be less efficient overall, yet it will scale to higher hardware resource usage.
 

mayankleoboy1

Distinguished
Aug 11, 2010
2,497
0
19,810


The basic assumption behind this is that all workloads can be parallelized. This is not true.

Meaning if method A requires 10 steps to complete but is restricted to a single method running at once and method B requires 15 steps yet can be run many times in tandem, the second method is preferred.

The assumption here is that parallelization means infinite (or large) parallelization. In practice the parallelization may not scale beyond 2-3 threads. Again, this depends on the dataset and the algorithm.

You see that thinking A LOT in HPC workloads.

Right. Only in HPC, A/V, rendering, some compression algorithms.


There are so many different things going on simultaneously that things should be working independently of each other.

All these simultaneous tasks still need to be in sync. Which either needs a main thread or message passing, which is slow.
The most parallelizable workload is pixel calculation and rendering, which is already parallelized.

-------------------------------------------
I am not saying we cant parallelize. We can, but only suitable tasks. And even then, single core perf is very important.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Initially I wrote "Increasing any of those you increase performance." But this is not true for single threaded and added the "in general" before posting. It is still a bad writing that I submitted too fast. I would have written something like "if the software is threaded enough", but it didn't seem needed after spending days explaining why Intel i5 are better than AMD FX-8 in poorly threaded games and how AMD eight-cores will be better with next gen games thanks to consoles.

Moreover, there is an important detail that some people forgets. Multi-core chips can increase performance of single-threaded software in a concrete situation: multitasking. Today the working paradigm doesn't consist in the old model of working with one application: open it, work, close it, open another...

Today, even newbies can have 2-4 apps opened at the same time. Even if each one of them is single-threaded, you can have several threads running at once. Or do you really believe that every owner of a dual or quad is only using a 25-50% of its chip daily?

However, experts as you will still find deficiencies in my post. For instance, sometimes increasing IPC or freq, does not increase performance; we discussed previously a GPU bottlenecked game, but you can find memory bottlenecks, and hard disk bottlenecks, and even network bottlenecks... it depends of the concrete task and hardware/software used, as you know.

Finally notice that the "greedy Intel bastards" are going to release their first 8-core chip. Surely, they are convinced that their new chip will provide performance advantages over their current six-cores.
 
The hardware performance depends on IPC, clock speed, and number of cores. Increasing any of those you increase performance in general. Evidently if you overclock enough an i3 it will outperform a quad.

More cores only helps performance if two conditions are met:

1: They are utilized
2: The CPU was bottlenecked before those extra cores were added

In games, if a 2 core CPU can process all threads and finish processing before the GPU finishes rendering the current frame, and has a higher IPC+Clock then a rival processor, the 2 core CPU will outperform the rival processor, regardless of how many cores it has and regardless of how many threads the game is using.

In short: If a 2 core CPU is not bottlenecked and has a higher IPC then an 192 core CPU, the two core chip will be faster. This is not affected in any way by the number of threads the application in question is using [as again: Not CPU bottlenecked]. Thats the point that is being ignored entirely by the peanut gallery.

Hardware is reaching physical limits with current technology and that we will not see tomorrow a single core clocked at 30GHz, but more and more multicores. Intel is going to release their first eight-core chip, because there is no way that they could fabricate a quad or a dual core chip with the same performance.

And I'm pointing out that software has reached its limits, at least in regards to how PC's are currently architected. Individual programs do not scale; developers have been trying to find ways to do that since the 70's, with very little success. The few things that do scale though, tend to be massively parallel, hence GPU's.

Hence why the GPU is so important: The is the one part of the system designed to handle massive amounts of parallel data, and hence why so much processing is being offloaded away from the CPU to the GPU, even as the CPU gets more cores. Even if you had a 16-core CPU, the data that can be made parallel will still execute several times faster on a GPU.

Going forward, you are going to see more processing offloaded to the GPU. Hence why both Intel and AMD are right to be focusing so much attention at getting an improved GPU on the CPU die, and focusing less on pure CPU performance.

[Also FYI, there's some REALLY interesting stuff going on at some universities right now, some of which could greatly increase processor throughput.]

I don't know what you mean by "blame consoles for holding PC's back", because it is evident that new consoles will increase PC gaming quality a lot of, even Nvidia is already saying that.

Repeating the same exact arguments made in 2005-2006 before the 360/PS3 launch. Don't buy the hype please; at least the PS360 was state of the art when they released. The PS4/XB1 at launch will be about as powerful as mid-range gaming PC's. How do you think they are going to look in 2-3 years, built around what will then be a "lowly" AMD 7770 [XB1] and AMD 7890 [PS4]?

GCC 4.7 --> 4.8.1

FX-8350: 23.34 --> 19.27.
i7-3770k: 33.05 --> 28.21.

Performance gain for the FX is 17.4%. Performance gain for the i7 is 14.6%. This is a 19% more for the FX.

Of course, the relevant improvements in performance for AMD chips will come from better support for the bdver2 FLAG. I don't know how many more improvement to wait, but 30-50% does not seem exaggerated. Better support for bdver2 will not improve the performance of Intel chips.

For a single benchmark. Others will lean Intel. Others will show almost zero effect on performance. It all evens out over the long haul.
 

8350rocks

Distinguished


Games like Crysis 3 and MechWarrior Online (yes an MMO) and the soon to be released BF4 and GTA5 are all bottlenecked on a dual core CPU.

I am not ignoring your point at all. I pointed out the argument stopped holding water 6 months ago several pages back. There is no point going forward that a dual core will outperform a quad or more core CPU on heavily SMT coded games at this point.

Look at the CPU usage figures from Crysis 3 several pages back, when I debunked this argument. Even the FX 4300 and 2600k were bordering on a bottleneck. Where is your greater IPC being enough argument when there are i7's nearly bottlenecking with HTT being used? The first chip that wasn't a bottleneck by much of a margin at all was a 6 core FX 6300, and anything better than that going forward was just farther from bottlenecking. This was such to the point that the 3960x was still at 50% resource consumption under load on Crysis 3.

Yet according to your argument, more cores will not outperform 2 cores in gaming (I note your caveat about bottlenecks, but my point is, that day is already here, and will continue to become more widespread).

EDIT: Before you start talking about bass-ackward coding...Crysis 3 requires a high end GPU to run as well...so it's not like they're just loading the CPU. You can't run Crysis 3 on high settings on anything short of something like a 7870XT or 670/760 (not talking about Ultra settings either, just high).
 

8350rocks

Distinguished


Is that why an i3-3220 had 99% CPU utilization on 1 core and 100% on the other with HTT maxed out, while the FX 6300 had about 84% usage across all 6 cores and the 2600k had 89% usage across all 4 cores with HTT maxed on 2 cores?

Hmm...sounds like you are a troll that doesn't know what he's talking about. My post was directed at someone who was discussing technical knowledge...and since you are clearly not versed in anything technical relating to PCs...please refrain from answering posts with unfounded, nonsensical speculation you pulled from the sky. This thread has many contributors with a vast array of technical knowledge...you are not one of them.

Next time I want you to answer my post, I will begin it with "Hafijur, do you think..."

That is all.
 

8350rocks

Distinguished


I posted Crysis 3 CPU utilization statistics about 5 pages ago and broke them down.

EDIT: Crysis 3 needs CORES, not "better cores" it needs MORE cores. The 3960x has 50% CPU utilization and the 8350 has 75% utilization...essentially, it runs best on 6-8 core CPUs. HTT is not a strong enough SMT mechanism to satisfy it's needs.
 

8350rocks

Distinguished


The day that I am trolling anything, is the day I am sitting in the back of a 70' Viking Convertible Sport Fishing boat, strapped into a chair going after Marlin.
 


He has a point, as for you, you are ruining the forums by posting B$. Oh, and speaking of "CPU Bottlenecks", I love how you claim a Bloomfield will bottleneck a GTX 670 in one of your posts, did you forget about a magical thing called overclocking?
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780



Late to whom to what ?

every chip is late, because the next big thing in less than a couple of years just obsoletes them.

In case of Cray if they had made the supercomputer deals 6 months later, nothing would be late. Its a perspective and is relative, but like an hamster in a wheel, everything repeats itself... if you are waiting for the next big thing then you'd never own a computer, the next big thing renews itself every year.

This is what makes this discussions very very stupid... compare what with what and at what workloads, computers now a days last much longer than the "propaganda" drivel created needs or perceptions ( supercomputers most certainly do).

most certainly a platform "upgradeability" is way more important, than the "tiny" momentanious superiority of any frinking irrelevant CPU... and this be it AMD or intel (edt)


have you REALLY bought anything AMD ? (edt)

If not... how are you going to assure decent prices for your preferred brand ? ... AMD right now is more heading to the history bin than anything else, crushed by the weight of negative propaganda, and this when they have superior products in many aspects...

... "drivel" pushes ppl to be like hamsters in a wheel... pervasive propaganda, like if you missed something or are late and obsolete, when everything is late and obsolete in a broad perspective... in a way little different from "politics", and look what shape the world is now!..

And i'm talking AMD, but i hope the ARM armada enter the fray... we as users need more AMDs, more choice not less.
 


Did I not mention, a magical thing called overclocking. You make them attempt to purchase a way more expensive setup, without even mentioning how easy it is to overclock. Performance per watt is irrelevant when running the AMD setup will be at the base price of the "competing" 4770K by the time both are ancient relics. http://www.guru3d.com/articles_pages/geforce_gtx_titan_review,12.html

2X faster, oh and, remember that the 920 is easily able to get to 2600K speeds... right
Application_03.png
pcmark02.jpg
blender.png
 




*sigh*

0nIkCAb.jpg


Intel quads, at a lower clockspeed, outperforming AMD's 6/8 core chips. If clocked the same [removing clock speed as a factor], Intel's lead would grow.

Case in point, some new(er) Crysis 3 CPU charts:

http://pclab.pl/zdjecia/artykuly/chaostheory/2013/02/crysis3/crysis3_cpu_jungle_1024.png[img]

Comparing the i3-3220 (3.3 GHz) to the FX-4300 (3.8GHz):

i3 wins. So the chip with 50% more cores and 14% more clock speed looses head to head.

OC the FX-4300 to 4.7GHz though (50% more cores and 30% faster clock), and it manages to pull an impressive 5FPS more then the i3.

Kinda makes you wonder what an i3 @ 4GHz would do, doesn't it?

Lets look at a different map, shall we?

[img]http://pclab.pl/zdjecia/artykuly/chaostheory/2013/02/crysis3/crysis3_cpu_human_1024.png

Woops, i3 within 5 FPS of the FX-8350...clocked at 4.7GHz. And again, note the trend of faster processors outperforming the ones with more cores.

Next map!:

crysis3_cpu_evil_1024.png


Noting a trend? i3-2320 within 4 FPS of the 8350 at stock.

And in case anyone doubts the benchmarks in question:

qSNrpeA.png


Matches the first within a FPS.

Key point being: Intel needs to release a 4GHz clocked i3, yesterday.

(Source: http://pclab.pl/art52489-9.html)

(Note: I used 1024x768 due to a clear presence of a GPU bottleneck at 1920x1080. Those images are right on the site, if you want to look at 1080p benches instead. Same results (slightly WORSE for AMD, due to the before mentioned bottleneck).
 

hcl123

Honorable
Mar 18, 2013
425
0
10,780


You mean

http://accc.riken.jp/secure/4721/07hayashi-amd.pdf
[Appendix A slide 25... at the very end of presentation ]

that is misleading, because it refers to APU v1 or Llano... yet its the same for BD/PD (ver 2 and 3).

Llano had 1 "Fmisc" pipe and 1 "Mul" and 1 "Add" FP pipes... so each pipe was not able of 2 operations (MAD), and what counts is only the "Mul" and "Add" pipes per core, so 4 cores is already 8 FLOP operations, be it Llano or BD/PD( only this last ones have FMAC, more efficient, 2 modules, 4 FMAC pipes, 8 Flop ops in total).

I'm certain is a mistake or "typo" of some kind, 8 flops x 3.5GHz is 28 GFLOPs... and no Llano core i know of... or BD/PD for that matter... can do 8 flop operations per cycle per core in any way( if truth i would like a explanation of how... please) (edt). 8 flops is the number of FP ops per cycle for the "entire APU" with 4 cores.

OTOH, the FLOP number for the GPU is correct: GPU cores x freq x 2 ops (MAD) -> 320 x 920(turbo) x 2 = 588 GFLOPS for Llano... 384 x 875 x 2 = 672 GFLOPs for Trinity (Richland is a little more).

 


1024x768.............................................................
 
1024x768.............................................................

Updated post; 1080p looks even worse due to GPU bottleneck, hence why I used 768. You can view the 1080p pics directly off the site if you want.

The point I'm making is this: If Crysis 3 really does favor more cores, then why is the i3-2320 consistently matching the FX-4300 at a higher clock speed? Additionally, why are the results sorted in order of per-core performance (Speed/IPC).

So please, stop it. We can't OC the i3-2320, but we can UC the FX-8350. I challenge anyone to clock the FX-8350 @ 3.3GHz, benchmark, then benchmark the i3-2320, and tell me which one wins head to head. Throw in the FX-6300 and FX-4300 too for good measure. That's really the only way this debate is going to be resolved.
 


We know that the 2320 is a decent competitor to the 4300, it has the same number of threads and weaker physical cores. That issue is to be overcome by Steamroller. Not like anyone buys the 4300..
 
Status
Not open for further replies.