AMD FX-8350 Review: Does Piledriver Fix Bulldozer's Flaws?

blazorthon · Oct 29, 2012

army_ant7 :

The clock frequency difference is small. The Pentium G2120 is 3.1GHz and the i3-3225 is 3.3GHz, not even a 10% increase. I just checked to make sure and I was wrong about the cache; they have the same cache. That makes Hyper-Threading the only major difference in gaming CPU performance.

http://ark.intel.com/compare/65692,65527

Also, remember that my review measured frame latency whereas oxford's measured FPS. In FPS, Hyper-Threading didn't make nearly as much of a difference as it did in frame latency as shown where my review compares these measurements.

army_ant7 · Oct 29, 2012

blazorthon :

I'm not sure if you just meant that you thought wrong, but I didn't see any mention from of a cache difference.

I noticed they were the same when I made my previous comment to rule out that as a performance differentiator. BTW, do all same generation Intel CPU's have the same clockrates for their caches, IMC's, and other stuff? Sorry, if I sound newbish with that question. Hehe...

It's interesting how they measured frame latency. When benchmarking software take into account minimum FPS, are they only precise to the second (i.e. count all the frames in the span of 1 second and see if it's the lowest ranking one) or do they measure frame latency and convert it to FPS (i.e. measure how long it took to get a new frame and see if it was the longest period e.g. it took 50ms between 2 frames thus it would say 20 is the minimum FPS, if you get what I'm saying). I wonder... I am aware that things can happen within a second that would be undesirable/noticeable to the gamer, just like micro-stuttering (thanks for explaining that to me a while back blaz).

blazorthon · Oct 29, 2012

army_ant7 :

Oops, I thought that I mentioned cache earlier, maybe I did in an edit and forgot to hit enter or something like that.

AFAIK, all LGA 1155 CPUs have a full-speed L3 cache, meaning that it runs at the CPU frequency. IDK about the IMC, I don't think its frequency is stated anywhere and I'm very sure that it can't be changed. Since we don't need to change anything to run incredibly fast RAM like we do with AMD systems, I'd think that it i either automatically adjusted or it's high enough that we don't need to adjust it. IDK if it's also constant across Intel's CPUs and it's possible that ti is, but it's also possible that it's not because of Intel differentiating some of the lowest end models by the maximum supported memory frequency.

They measured the average amount of time that each frame took to be fully rendered and then calculated effective average FPS from that. If there was bad stutter, we should see it as a reduction in effective FPS according to their calculations instead of it being masked by the FPS.

army_ant7 · Oct 29, 2012

blazorthon :

I guess that's nice to know, that Intel processors are configured nicely, though I have heard before how they use RAM more efficiently compared to AMD when ran at the same speed. Does L3 cache have to be full-speed to be optimal? I'm guessing it's a situation based thing, but does it have to "keep up" with the other parts of the CPU (processor)? I've heard most from you how Bulldozer (and I guess Piledriver as well) don't have them running at full speed. I'm wondering if making it go faster than the core clockrate (lets say on BD and PD) would have any advantages, or if even having it substantially lower would be enough to not be a bottleneck to the CPU's potential performance anymore...

That's the method of how to measure frame latency which sounds nice, but I was wondering how regular benchmarking programs do it (maybe like Fraps), though I probably shouldn't be asking here since maybe only the actual programmers know, unless they've told anyone else. Hehe...

blazorthon · Oct 29, 2012

army_ant7 :

How much of an impact that L3 cache has on performance is definitely situation-dependent. For example, the FX-4300 and the A10-5800K both have nearly identical performance in many workloads, but in some workloads, the FX-4300's L3 cache (slow as it may be) can give it a substantial advantage like around 30%. The latency, bandwidth, and capacity of the cache are all factors that can be favored by different programs and to make it even more complicated, how much of an impact the cache has also depends on the same factors of the relevant L1 and L2 caches and the memory. For example, the L3 cache's capacity can be more important for capacity-bound programs when there isn't much L2 cache, but if L2 cache is sufficient for capacity-bound applications, the L3's capacity might not be important.

The caches frequency is just one factor in its bandwidth and latency. Increasing the frequency increases bandwidth while decreasing latency, kinda similar to how going from DDR3-1600 9-9-9-24 to DDR3-2133 9-9-9-24 increases bandwidth while decreasing latency because timings are measured in clock cycles, not in time.

Having the L3 cache run at the CPU frequency should help quite a bit compared to running it far lower than the CPU frequency. More bandwidth and lower latency never hurts AFAIK and in many things, such as gaming, it really helps. Running the L3 cache above the frequency of the CPU cores would probably still help performance, but I think that running the cache at a frequency that is equal to or a multiple of the CPU frequency is probably ideal. Going over the CPU frequency by say double would probably help performance because it'd still be increasing bandwidth while decreasing latency, although a 6+GHz cache would probably require some high voltage and if so, it'd probably consume a lot of power while generating a lot of heat.

About having to keep up with the rest of the CPU, well, yes, the cache hierarchy should. If the caches of a CPU fail to keep up well enough, they can hurt performance and as we can see by raising the L3 caches frequency of AMD CPUs, performance most certainly is hurt by AMD's incredibly slow caches.

The original Phenoms were the first CPUs from AMD that had L3 cache IIRC. AMD has not made any improvement in L3 cache frequency since those times and it seems like their L3 cache latency has not improved as measured in real-time latency either, at least not since Phenom II, hence my distaste with AMD's current L3 cache situation. Did they really think that keeping cache at fairly similar performance to what it was like five years ago, albeit with higher capacity now, is a good idea? We can prove that it hurts performance significantly, so I'd say that it's not a good idea and like I've said before, IDK why AMD keeps using such slow cache. What some might find funny is how if Bulldozer had Intel's L3 cache's performance, Bulldozer might have far higher performance, both multi-threaded and per core.

I think that regular benchmarking programs just count the FPS in each second or at a still fairly large sub-second interval, but I haven't really looked into the specifics. I know that they don't measure the average frame latency and that that's why you can't see stutter by looking at FPS numbers. If you want to read more about it, you could go back to the article that I linked (I'll link the page talking about their measurement method here):
http://techreport.com/review/23662/amd-a10-5800k-and-a8-5600k-trinity-apus-reviewed/4
Also, here's a link in that page that goes further in-depth about their methodology:
http://techreport.com/review/21516/inside-the-second-a-new-look-at-game-benchmarking

blazorthon · Oct 29, 2012

tourist :

That's not the CPUs that have it, that's the motherboard 😛

oxford373 · Oct 30, 2012

@blazorthone,if you still have doubts about whether metro 2033 is highly multithreaded game or not here is another prove shows that this game use all cores available http://www.tomshardware.com/reviews/nvidia-physx-hack-amd-radeon,2764-5.html
I think metro 2033 looks like it's GPU limited game(which it isn't) because it's highly multi-threaded game and if future games use all available cores like this game we might see games that allow 8 FX cores CPU to outperform i5 since it turns out that most games don't benefit from HTT(and in some games core i5 faster than i7) and it is even recommended to disable HTT in core i7s to get a little bit faster FPS in some games(we have seen this issue in many games) like battlefield 3 http://www.tomshardware.com/reviews/battlefield-3-graphics-performance,3063-13.html
maybe i7-3960x was the slowest one in that benchmark because of HTT issue, it happened before with core i7-2600k in metro 2033 ,just see the min FPS in this benchmark for i7-2600k http://www.tomshardware.com/reviews/sandy-bridge-core-i7-2600k-core-i5-2500k,2833-18.html
here is another benchmark for this game when i7-3960x was slower than other intel CPUs http://www.anandtech.com/show/5626/ivy-bridge-preview-core-i7-3770k/9
IMO CPU bottleneck issues usually happen with lightly threaded games like skyrim V and star craft II not with highly multi-threaded games like metro 2033 (I think crysis 2 http://www.tomshardware.com/reviews/core-i7-3930k-3820-test-benchmark,3090-10.html ,dirt 3 ,just cause 2 and battlefield3 use 4 cores so in these games we didn't see CPU bottleneck issues ) and if future games will use all available CPU cores, the graphics card should be the only bottleneck(not the CPU), Because the game performance bottleneck tends to shift to the graphics card when the CPU becomes fast enough(and in highly multi-threaded games ,any 4+ cores CPU or fast quad core should be fast enough) .

army_ant7 · Oct 30, 2012

@blazorthon
As always, thanks! Especially so for that thorough explanation.

I'd really like to see what BD and PD could do compared to Intel's current and future line-up with the proper adjustments... I actually hope to get a system with either of those CPU's (preferably the latter) and communicate with you for experimentation, but don't count on it since I'm not working right now... Hehehe... 😛

@oxford373
Interesting stuff eh... The programmers of Metro 2033 either ingeniously found a way to multi-thread it that much if it is so, or they may have also made it perform less efficiently with the CPU (due to multi-threading overhead and thread locking) compared to lesser threads. The latter might be unlikely because they might've not released the game like that (multi-threaded) then, though we don't know what happened/-s in their studios. I'm looking forward to AMD being able to catch up with more nicely threaded applications (games in particular). 😀

blazorthon · Oct 31, 2012

[citation][nom]oxford373[/nom]@blazorthone,if you still have doubts about whether metro 2033 is highly multithreaded game or not here is another prove shows that this game use all cores available http://www.tomshardware.com/review [...] 764-5.htmlI think metro 2033 looks like it's GPU limited game(which it isn't) because it's highly multithreaded game and if future games use all available cores like this game we might see games that allow 8 FX cores CPU to outperform i5 since it turns out that most games don't benefit from HTT(and in some games core i5 faster than i7) and it is even recommended to disable HTT in core i7s to get a little bit faster FPS in some games(we have seen this issue in many games) like battlefield 3 http://www.tomshardware.com/review [...] 63-13.htmlmaybe i7-3960x was the slowest one in that benchmark because of HTT issue, it happened before with core i7-2600k in metro 2033 ,just see the min FPS in this benchmark for i7-2600k http://www.tomshardware.com/review [...] 33-18.htmlhere is another benchmark for this game when i7-3960x was slower than other intel CPUs http://www.anandtech.com/show/5626 [...] i7-3770k/9IMO CPU bottleneck issues usually happen with lightly threaded games like skyrim V and star craft II not with highly multithreaded games like metro 2033 (I think crysis 2 ,dirt 3 ,just cause 2 and battlefield3 are 4- threaded games so in these games we didn't see CPU bottleneck issues ) and if future games will use all available CPU cores, the graphics card should be the only bottleneck(not the CPU), Because the game performance bottleneck tends to shift to the graphics card when the CPU becomes fast enough(and in highly multithreaded games ,any 4+ cores CPU or fast quad core should be fast enough) .[/citation]

Maybe that's it then.

Also, BF3 can use at least six threads, maybe eight or more nowadays. This is easily shown because in MP with many players, we can see significant scaling from the lower core count CPUs to the higher core count CPUs of the same family. That it works this way on AMD's Phenom II and FX-x1xx/x2xx CPUs also shows that it's not or at least not nearly entirely the cache differences on Intel's CPUs causing the difference either.

oxford373 · Nov 1, 2012

[citation][nom]army_ant7[/nom]@blazorthonAs always, thanks! Especially so for that thorough explanation. I'd really like to see what BD and PD could do compared to Intel's current and future line-up with the proper adjustments... I actually hope to get a system with either of those CPU's (preferably the latter) and communicate with you for experimentation, but don't count on it since I'm not working right now... Hehehe... @oxford373Interesting stuff eh... The programmers of Metro 2033 either ingeniously found a way to multi-thread it that much if it is so, or they may have also made it perform less efficiently with the CPU (due to multi-threading overhead and thread locking) compared to lesser threads. The latter might be unlikely because they might've not released the game like that (multi-threaded) then, though we don't know what happened/-s in their studios. I'm looking forward to AMD being able to catch up with more nicely threaded applications (games in particular).[/citation]
i just want you to read the right benchmark http://www.tomshardware.com/reviews/nvidia-physx-hack-amd-radeon,2764-5.html
yes this game is highly multi-threaded game and it can fully utilize all CPU cores when CPU physx enabled but we cant find any CPU benchmarks for this game with CPU physx enabled(except this one with X6 1090t),and we need tom's hardware now to help us with CPU benchmarks for this game with CPU physx enabled because it's the only way to fully utilize all cpu cores in a game.

army_ant7 · Nov 1, 2012

@oxford373
Interesting... The game also seemed to evenly tax all 6 of the cores when PhysX was ran on the GPU and not ran at all, though to a higher (CPU) utilization. This is relevant to AMD GPU users.

Correct me if I'm wrong, but not all games/applications allow CPU-based PhysX effects right? I remember that some games can only allow certain additional PhysX effects with Nvidia cards. This actually got me to thinking that PhysX (at least nowadays) just allows some (hardware-accelerated) additional effects in games and not really to make a games physics calculations (if it is PhysX) run on the GPU. I'm not sure though if those games add (hardware-accelerated) effects and make some of the physics calculations that run by default run on the GPU as well. Is this feeling of mine true or false for current (or most, or at least some games)? Feel free to elaborate, anyone.

(And thanks for sharing that article. I don't recall finding this article in the past.

)

oxford373 · Nov 3, 2012

@army_ant7
first,physx is just an option in games, it doesn't matter what is your graphics card(intel,AMD,nvidia) you will have an physx option to choose(off,medium,high),for example even if you have slowish hd 6450 you still can maxout settings in games but the problem here the game won't be playable, and the same thing in games that use physx, you can set physx to high even if you don't have nvidia graphics card but the game won't be playable in most of these games.
and all physx games use CPU for physx when you don't have nvidia graphics card but as that article says most physx games don't use more than 2cores for physx that's why nvidia GPUs much faster when physx set to high.
in some games that use simple physx effects like (mirror edge,NFS shift) you can enable physx regardless what is the graphics card and you won't notice any performance difference between nvidia or AMD graphics card http://www.youtube.com/watch?v=Cv4XMDLLhIo
other games use advanced physx effects like (mafia II,alice madness,batman arkham city,batman arkham asylum,borderland 2,metro 2033),in these games nvidia graphics cards are much faster with physx enabled, but even in these games i saw playable frame rates with physx set to medium and sometimes set to high with OCed SB core i5
http://www.youtube.com/watch?v=uf2s4TdnH9w
http://www.youtube.com/watch?v=6Z_MphFL3BM&NR=1&feature=endscreen
http://www.youtube.com/watch?v=kKvtEeih2-g&feature=endscreen&NR=1
http://www.youtube.com/watch?v=Kz1f8OXvTdg
and there are some tweaks to make CPU physx faster even if you don't have fast CPU by disabling some physx effects http://physxinfo.com/news/3628/mafia-ii-demo-tweaking-physx-performance/
if you want to know more about CPU physx here is some articles about that
http://www.bit-tech.net/hardware/graphics/2010/09/03/mafia-2-physx-performance/2
http://physxinfo.com/news/6922/batman-arkham-city-physx-benchmarks-roundup/
phenom II vs i7-2600k CPU physx http://www.gamestar.de/spiele/batman-arkham-city/artikel/batman_arkham_city_im_benchmark_test,45777,2562565.html
http://physxinfo.com/news/9425/borderlands-2-is-cpu-capable-of-handling-the-physx-effects/
http://www.pcper.com/reviews/Graphics-Cards/Borderlands-2-PhysX-Performance-and-PhysX-Comparison-GTX-680-and-HD-7970/GPU-
finally metro 2033 is one of rare physx games that use all CPU cores for physx.
btw,there aren't too many games that use physx here is a list of physx games http://en.wikipedia.org/wiki/List_of_games_with_hardware-accelerated_PhysX_support

army_ant7 · Nov 3, 2012

@oxford373
Wow... I wasn't aware... Thanks for all the info! 😀
I guess it must've been some things along with Nvidia's advertisements that made me have the misconception PhysX effects could only be run with Nvidia cards. Things like MSI's Kombustor (which is supposedly based on FurMark) having a an option to enable and disable PhysX, but come to think about it, it might just be an option to allow you to run PhysX with your CPU or your (Nvidia) GPU. Another thing that made me think these options were exclusive was how some people comment that if you want to have these effects, you should get GeForce cards. I guess, if they aren't mistaken like me, that they mean so that the games would run well with these effects. Thanks again!

mohit9206 · Nov 3, 2012

blazorthon says that hyper threading in intel core i3 is not beneficial in increasing frame rates but it does affect frame latency timings a lot in games.
i wanna know in short whats frame rate latency and is it more important than average frame rates.
also for a person looking to build a cheap intel gaming system does it make sense to go for core i3 or pentium for purely gaming purpose ignoring quick sync and other features that i3 boasts considering pentiums costs much less than core i3's ?

razor512 · Nov 3, 2012

Wanted to add to why I feel that the AMD FX CPU's are simply a improved implementation of hyper threading. A CPU with truly discrete cores can have 1 thread do something that is CPU intensive but not system memory intensive, then start an entirely new thread and have it do something different on another core and the first thread will not lose any of it's performance.

With hyperthreading and the FX cores, if you start a process on 1 core then start a completely different thread doing something else, if you cycle the core affinity you will see that there will be 1 other core (as seen by windows task manager) that will negatively impact the performance of the first thread even though it is using a different core as seen by task manager, and that is because you are using a "core" that is sharing more resources with another core.

How can this not be hyper threading when these FX chips are having the same core related performance issues.

The 8 "core" fx chips can get pretty much 100% scaling for up to 4 threads if you take a 4 thread process and manage the CPU care affinity in such a way that each thread lands on a "core module" as at that point it is functioning the same as a traditional CPU.

I call what AMD has done to be an improved version of hyper threading because it behaves the same way as hyerthreading, have 1 core module (2 cores with shared resources according to AMD) do 2 different things and the performance for both threads drop significantly as compared to a Phenom II doing the same task.

army_ant7 · Nov 4, 2012

mohit9206 :

This is what I've understood about it (and I hope I'm not mistaken). Frame latency is how long 1 frame takes to render.So when talking about minimum frame latency, it considers what single frame took the longest to rended.

FPS (frames per second, i.e. one common measure of frame rate) takes into account how many frames were rendered within the span of 1 second. So minimum FPS considers the second with the least amount of frames rendered in it.

Now, even though 1 second is really brief, we (or some people more than others) can still discern inconsistencies in frame rates within 1 second (e.g. micro-stuttering with some multi-GPU setups). For example, I get a min FPS of 30. That doesn't necessarily mean that every 33.33milliseconds (30 frames/sec = 1 frame/(1/30sec) = 1 frame/33.33ms) there was a frame rendered. It means that there were a total of 30 frames rendered within that second, whether 1 frame stayed there for 75ms while another 10ms and yet another 100ms, etc. (take note of the inconsistent latencies which mean a lack of smoothness which some people may notice).

I hope I explained that in an understandable way, though please do say if not or if I could elaborate more on something. Also anyone feel free to correct me if I've misrepresented any info at all.

I can't tell you much about a Pentium vs. a Core i3 (Sandy or Ivy Bridge). I have heard that Pentiums are pretty good for budget gaming, but I guess they don't suffice with certain games or settings because more powerful CPU's are still recommended. Whatever more the i3 may offer in performance may not be noticeable as well if the GPU is a bottleneck (i.e. a lower-end, not so powerful GPU). The chart blaz shared above also does seem to indicate the worth of a Core i3 over a Pentium. It should perform better in non-gaming tasks as well I think, probably ones that utilize more threads. I have heard (possibly from blaz as well) that i3's are pretty good paired with pretty high-end graphics in a way that they wouldn't bottleneck the system so much, but this probably varies with the game.

I personally use a Core i3-2120 and seems to have been running beautifully with 2 HD 7850's in Crossfire, though I'm not sure if whatever games I've seen played on this system would run noticeably better with a Core i5. I'm thinking probably, but I feel that even the highest-end Pentium would yield the same performance I'm experiencing now. (Though I haven't had a Pentium on this system so I wouldn't know.)

Again, I hope this info was relevant to your situation.

razor512 :

I don't think HTT and AMD's module-based implementation should be compared in that way, since they're very different.

About how Task Manager "sees" 2 cores per Intel Hyper-threaded core and AMD module, I have heard that Windows recognizes the difference between a physical and logical core with a Hyper-threaded Intel core, and thus assign workloads appropriately based on this. I think that leaves room for Windows to be more optimized for AMD's modules--recognizing that the two integer cores in one module share specific resources and that the kinds of workloads assigned to those two cores should be done in an optimal/appropriate manner. I'm not sure if Windows does already or if anything could really be done about it, but yeah just sharing...

With HTT, you have 2 threads running on the same core. I'm not totally sure about how it works, but I think it just granularly tries to take advantage of the whole core with all its resources. I'm not sure if that means that if there are integer instructions, being processed then some floating-point instruction could be squeezed alongside them to be processed or what. With AMD's module implementation, you have two different integer cores which can process integer instructions simultaneously. I have heard that the problem with AMD's implementation of this (at least with Bulldozer) that it doesn't feed data to these cores well enough (resource hogging/constraints).

I think how AMD's module implementation scales with threads depends on the workload (if it's more integer-based or floating-point-based). I'm not sure how Intel HTT cores behave with different kinds of workloads.

I feel very unsure and probably know very little about low-level CPU functionality, and would like it if anyone could add or correct me if I've explained something erroneously.

razor512 · Nov 4, 2012

Intel's implementation of hyperthreading is duplicating the many core components, eg registers and other supporting components but the execution core, L1, and L2 cache, fetch, decode and various schedulers are shared. This allows the execution core to be better used since due to the way. The CPU then uses some internal scheduler that rapidly shares the processing capabilities of the execution core and since that scheduling happens faster than how the OS will handle the threads, The CPU is able to process data from another thread while the OS's scheduler prepares more for the first thread to do

AMD simply took that model further and duplicated the L1 cache and added an extra execution core.

This allows 2 threads to truly run at the same time but for each core module (each module has 2 cores) most of the die space is shared and if 2 threads on the same module needs access to the same resources that are being shared, then you get a major performance drop.

AMD core module diagrams: http://i.imgur.com/ANZUG.jpg

Many of the components are still shared, they just chose to duplicate a few additional additional areas and then call it an extra core. (they define the core by the execution unit and not all of the other components that make up the traditional core, it is like looking at a photo of a single train and calling it 2 trains because there is an engine car in the front and back.

The area where AMD core modules really struggle is when multitasking, 2 different processes each with a full set of needs, on a traditional core that would not be a problem, but for the crap core module, what 1 application does can significantly impact the performance of another application. I purchased a bulldozer CPU a while back but returned it to newegg when that chip at 4.5GHz was running slower than the Phenom II x4 960 running at 3.9GHz, I later took a big risk and purchased a Phenom II x6 1075t from ebay and pushed it to 4GHz.

tomshardware did their testing using multithreaded applications where a bulldoser module is better able to handle since not every thread will need a full set of core resources to function properly but if 2 separate applications try to use 2 cores on the same core module, then things really go down hill. (you can test this by running 2 different CPU benchmarks at the same time, or running benchmarks inside of 2 virtual machines and repeat the test, each time pushing the VM to a different thread on the host OS.

army_ant7 · Nov 4, 2012

@Razor512
Oh... Your description of HTT does sound more like it. I remember something like that. So it basically causes the execution cores to be more active (i.e. keep them busy) through feeding them faster with "work" eh... Thanks for correcting me. (Tell me straight up, was I wrong with what I knew about HTT? I mean, can the CPU use its FPU and integer unit (I'm not sure if that's the ALU or if the ALU works with all math-related data or what.) at the same time.)

Hehe... I'm such a newb! :lol:

In AMD's module approach, did they duplicate the same resources that Intel did with HTT? I'm wondering if they did quite the opposite and duplicated the (integer) execution cores instead.

It seems that AMD needs to either do something like HTT then (or at least the part where they have a better way of feeding instructions and data to the execution cores. I guess that's what they're up to right now. Making their implementation more efficient in its design in this regard.

I'd say that it's more accurate to call them "modules" like we do. I wasn't fond about how they did advertise them as an octa-core. "Quad-module" sounds cool and hip IMO. :lol:

razor512 · Nov 4, 2012

your statement is accurate, the whole goal behind hyper threading is to get the OS to see 1 core as 2, the parts they double are only the components that transport and store architectural state and info about the thread, the parts that do the actual processing of the data and the parts that feed the core data are shared. While I could not find much detailed low level info on the core module and intels hyperthreading (just simplified diagrams, both seem similar in how resources are shared with the main difference being AMD duplicating the execution core.

http://www.extremetech.com/computing/138394-amds-fx-8350-analyzed-does-piledriver-deliver-where-bulldozer-fell-short/1

The the main test they failed to do was was a multitasking test where 2 different single threaded applications attempted to use the same core module and then use 2 different modules. but even with the same application using multiple threads, going from 4 threads on 4 modules, to 4 threads on 2 modules, you lose almost a full CPU core's worth of performance.

psiboy · Nov 5, 2012

I would love to see the 1100t in these benchmarks oc'd to around 3.7 to 4ghz just to see how well it holds up clock for clock against the fx series....

blazorthon · Nov 5, 2012

[citation][nom]Razor512[/nom]your statement is accurate, the whole goal behind hyper threading is to get the OS to see 1 core as 2, the parts they double are only the components that transport and store architectural state and info about the thread, the parts that do the actual processing of the data and the parts that feed the core data are shared. While I could not find much detailed low level info on the core module and intels hyperthreading (just simplified diagrams, both seem similar in how resources are shared with the main difference being AMD duplicating the execution core.http://www.extremetech.com/computi [...] ll-short/1The the main test they failed to do was was a multitasking test where 2 different single threaded applications attempted to use the same core module and then use 2 different modules. but even with the same application using multiple threads, going from 4 threads on 4 modules, to 4 threads on 2 modules, you lose almost a full CPU core's worth of performance.[/citation]

That problem is simply a front end bottle-neck that is to be rectified in the next AMD micro-architecture, Steamroller. The problem isn't the modular concept, its the execution and implementation of it in Bulldozer/Piledriver (which is identical between both micro-architectures). Steamroller is supposed to fix it by giving each module the front end of two cores instead of the front end of one core and is supposed to launch in mid to late 2013 or early 2014.

army_ant7 · Nov 5, 2012

razor512 :

That kind of data would be nice to have. (You were referring to Tom's Hardware as the one that didn't do this test right?)

If I understood the chart you shared correctly, Piledriver was able to get 64.2% scaling (using derived data) from using both cores of 2 modules (4 cores) compared to if it only just used 1 core each of 2 modules.
I got this number from halving the 4M/4C data (4.3 / 2 = 2.15) in the hopes of deriving theoretical data of a 2M/2C test, though I have a feeling that in reality, it should've had a score higher than 2.15 because I believe running more threads on a program usually doesn't make it scale perfectly in performance. Then I compared the derived 2M/2C data (2.15) to the 2M/4C data (3.53) to get a 64.2% increase.

What's peculiar is how the score of the 4M/8C test is lower than than the 4M/4C test. This got me thinking that I may have it wrong about how this test gives scores. I'm thinking now that 8-threaded test scores (of the 4M/8C test) is not directly comparable to 4-threaded test scores (of the the 4M/4C). If this were the case then my 2nd paragraph wouldn't be sound. Hehe... Please clarify this for me. Thanks.

razor512 · Nov 5, 2012

I thing for the 8 core test, they left cinebench set to 4 threads but either enabled all 8 "cores" in the BIOS or allowed returned the CPU affinity to the default of having access to all cores and relying on the scheduler to mange the 4 threads. (In cases of an application using fewer threads than the number of cores, you will often run into issues like a single threaded app using 50 percent of 2 cores instead of 100% of one core.

army_ant7 · Nov 5, 2012

I see... That's the kind of scheduler optimization that they have to implement properly then if it isn't yet... That threads should not saturate both cores of a module but be assigned to one core of a module. Only when more than 4 threads are running concurrently would they tax the rest of the cores of the modules.

Come to think of it, they should also maybe find a way to... Well, it's easier for me to describe in a example. When, for example, you have an application that uses 6 threads concurrently and you have a single-threaded application running at the same time, they should give that single-threaded application 1 module to itself and run the 6-threaded application on the other 3 module (3 x 2 cores = 6 cores). So as to not have 6-threaded application "bother" the single-threaded application by hogging resources on the module it's running on.

Sound like a good idea? I think someone suggested something like this done with PSCheck. (Just in case, you can use PSCheck to change the individual clockrates of the cores. One useful application I've heard from blaz, which he in turn heard from palladin(number something) is to sort of manipulate the Windows thread scheduler in a desirable way. Lowering the clocks of certain cores (like a core in each module) so that it wouldn't hog much resources from the other cores (which could be overclocked in turn), also forcing more taxing threads to run on the unimpaired (or overclocked cores)).

About what you said about a single-threaded app running 50% on 2 cores... Did you intentionally mean that? (I don't mean to offend.)

It's just that I thought that having 1 thread meant that it can only run on 1 core (or hardware thread) at a time. In case what you said really happens, then I'm guessing that Task Manager checks the usage of the cores after a certain interval in such a way that a single-threaded application could make 1 run through 1 core and then make another run on another core in between these intervals, and thus have Task Manager readout both cores being utilized. Yes?

Guest · Nov 6, 2012

problem is the most overlooked item of cost saving is the electricity use
sure it's $100 cheaper than the 3770k but the 125W TDP vs 77W TDP.. in the long run you will save with intel.. and have better performance..

AMD FX-8350 Review: Does Piledriver Fix Bulldozer's Flaws?

Glorious

Distinguished

Glorious

Distinguished

Glorious

Glorious

Distinguished

Distinguished

Glorious

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Glorious

Distinguished

Distinguished

Distinguished

Guest

Guest

Share this page