AMD Trinity On The Desktop: A10, A8, And A6 Get Benchmarked!

Page 5 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
army_ant7: I Thought I'd read 32% in the article, now I read 32.8% Maybe I've missed the .8.

Back to topic:
Computing performance is calculations per timeframe. Time elapsed for a given amount of calcs is an inverse function for performance.

I don't know the amount of resulting calculations (e.g. bytes compressed the correct way) in the winzip test, but let's assume it's 1,000,000 calculations. Just as an example. Let's break it down to caculations per second.


Classic code:
1,000,000 calcs / 131 seconds = 7,634 calcs / second

OpenCL code:
1,000,000 calcs / 88 seconds = 11,364 calcs / second

Now, Percentage is

100 * 11,364 / 7,634 = 149%
Thus, an improvement of 49%.
11,364 c/s is 49% higher than 7,634 c/s
The other way around:
7,634 c/s is 33% lower than 11,364 c/s

Yet, less is not an improvement. Improvmement is the percentage above 100%

That's the problem when the result (seconds) sits below the fraction line, it's an inverse function. Hope the terms in english are correct.


No offense.
 


Windows is not aware of how BD modules should be scheduled. So, we have Windows pretty much throwing threads all over the place. By disabling one core per module, you're basically forcing Windows to schedule threads the proper way for the modular architecture.

module 0 module 1 module 2 module 3
core 0 core 1 core 2 core 3 core 4 core 5 core 6 core 7
Each module has only so much hardware to share. For example, each module has four x86 decoders and 2MB of L2 cache. Letting a single core have control over all of a modules resources lets it have higher performance because it will have fewer cache misses (from the greater cache capacity) and other advantages that it would have shared with a second core instead of have it all to itself. If you disable an entire module, then you don't get this advantage because even though you disabled cores, you didn't disable the correct cores. Disabling modules makes it effectively an fX of a lower tier.

http://hardforum.com/showpost.php?p=1037482638&postcount=88

Then, there's the cache problem and more that also hold AMD back. AMD truly does make good architectures... However, they are constantly shooting themselves in the foot in other ways.

http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/2

http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested/6

 

no i do not work for amd or intel or geforce whatever. don't have any qualifications lol.
every company bins their cpus and gpus and other products and sells them. it's a common practice.

amd and other corps gives everything names. the fx cpu's codename is zambezi, the cpu die is called orochi, the platform (fx cpu + motherboard and may be gfx card and amd brand ram too) is called scorpius platform.
the recent apu lineup is called trinity, the cpu cores in the apu is based on piledriver cores, the igpu is called devastator et cetra et cetra.

disabling modules and cores should open up enough thermal headrooms for turbo to hit higher. the disabled bits do not absorb heat. they can't put 'whatever's active' on a smaller die size because it's impossible to do so.
 


Yes, AMD uses lasers now to stop people from unlocking cores. Dies tend to have names. For example, Athlon II x2 has Regor, Phenom II x6 has Thuban, Phenom II x4s with 2 unlockable cores have Zosma, and there are many more. It's like how Nvidia has their names such as GK104, GK106, GK107, and AMD has Tahiti, Pitcairn, and Cape Verde, also among many, many more.

What you said at the end of your post is also partially correct. The problem is that most of the CPUs with locked cores have inferior binning compared to the higher core count models, so although they can have more space on-die to dissipate heat through, their inferior binning means that they generate more heat. An example of how it can help would be comparing a Phenom II 960T (Zosma) to a different Phenom II x4 that does not have a T in its name, such as the Phenom II x4 955-980 CPUs.
 


Disabling one core from each module actually inhibits Turbo Core frequencies because they are based on a per-module measurement, not a per-core measurement like other CPUs that have Turbo. The disabled parts do absorb some of the heat. You can't have a generator of heat (the active parts) connected to a disabled part and the disabled part not absorb some of the heat for better dissipation. However, AMD's die-cut models tend to have inferior binning, so they can still overclock worse. However, disabling the cores of a higher core count CPU yourself instead of using a die-cut CPU that has inferior binning proves this rather nicely.
 
Why arent they comparing these APUs to Intels IVB and their own cheap CPU + Cheap GPUs?

More like a press release then a real review. This site has gone so far downhill.
 


1. Not enough time.
2. these are literally just previews and we don't know how close they are to how things will work after the commercial launch.
3. How has the site gone downhill? This was never supposed to be a real review of how things will perform and act, just a show of how far along we are right now. The full review will come when it can.
 
Still, there SHOULD be at least one Intel CPU thrown in there, it's pure logic, AMD's CANNOT compete with each other, isnt it hard to put at least the 'weakest' i3 or i5 in the mix?
 
If disabling a core in a module + all the memory controller L3 flaws could make AMD competitive again. Why bother releasing a broken architecture to the consumer market? This is ontop of their immature 32nm process. This is basically asking for trouble, immature 32nm process + new broken architecture.

1. AMD could have just release this bulldozer as their Opteron line + make it as a stepping stone in the server market since multi-threaded are common here and are instantly benefited from bulldozer architecture. then they can start pushing the software developer to do module base scheduling. Once the module scheduling mature = release the consumer version CPU.

2. OR... release the fully fledged bulldozer for Opteron only. Disable 1 core on each module on the consumer CPU, disable some L3 since it has less core now. This allow AMD to salvage defective 1 core module + defective L3 and sell it as consumer CPU. Disabling 1 core on each module + some L3 also release some thermal headroom, allowing higher clock rates within 125-140wTDP.

3. follow Intel's tick-tock strategy, release 32nm 8 core Phenom II, or a higher clocked Phenom II X6/X4. buy more time for the engineers to fix the bulldozer flaws, wait the 32mn mature, then release it on a new socket specialize for bulldozer rather than restricting themselves designing the new CPU around the AM3 socket specification

basically I see no good thing from AMD, they have no idea how to execute their product release and design. Even the Trinity IGP Radeon now is VLIW4, instead of making the low end HD7000 similar to Trinity IGP to easy classic crossfire, the entire low end line up of HD7000 are base on VLIW5. Crossfire diff GPU are much less mature. Another blow from AMD here.
 
Regarding Fritz Chess Benchmark: the assumption that floating-point has anything to do with chess is just wrong. "Llano assumes a lead in Fritz, pointing to Piledriver’s shared floating-point resources as a weakness in this particular test."

Chess does not use floating point at all. Chess has a huge amount of just simple comparisons. Is this value larger than that? and such. It also has many tables for transpositions. In all likelihood the reason Fritz is not doing so great on this chip is the amount of L3 memory. In chess, you never know what value is going to be valuable so vast amounts of evaluations must be stored and compared repeatedly...that needs memory. The faster it is, the better.

AI things like IBM's Watson also need lots of fast memory. It is likely most AI related things in the future would benefit from similar changes. Things like autonomous cars, more advanced voice recognition, real-time translation, object recognition, and such would likely benefit from lots of very low latency memory.

Chess has never been a good candidate for CUDA for instance because each core can only access a small amount of memory and it needs what all the other cores have found so it can learn what it can disregard and what it has to look closer at. In video and many other things is is clear what will be needed by a process so it can manage with its little piece of reality...chess is unpredictable...a kind of data mining, where it generates its own data to mine.
 
@Edgar_Wibeau: Oh yeah. It does look like you're right about this. I should've put it in an algebraic formula just to be sure since Math can get confusing sometimes. Thanks! I did feel somewhat unsure andI did acknowledge that I could be wrong. Sorry for using up your time like this. Hehehe...

@Blazorthon and De5_roy: Thanks for the info guys! The name "Orochi" does sound a little more familiar now and I don't just mean from Japanese mythology. Hehe... Like I've encountered it somewhere. Anyway...

[citation][nom]blazorthon[/nom]Disabling one core from each module actually inhibits Turbo Core frequencies because they are based on a per-module measurement, not a per-core measurement like other CPUs that have Turbo.[/citation]
How does this affect Turbo Boost negatively though since all modules are active? Oh yeah! Well, I'll just explain it further in case anyone reading might wonder. Also, correct me if I'm wrong.
Aside from thermals, Turbo Boost is also limited by the number of active cores/the number of sleeping cores (another way to put it). I was thinking that why not just base it on thermals, but the thought of overclocking while having insufficient voltage hit me.
 
[citation][nom]Tomfreak[/nom]If disabling a core in a module + all the memory controller L3 flaws could make AMD competitive again. Why bother releasing a broken architecture to the consumer market? This is ontop of their immature 32nm process. This is basically asking for trouble, immature 32nm process + new broken architecture. 1. AMD could have just release this bulldozer as their Opteron line + make it as a stepping stone in the server market since multi-threaded are common here and are instantly benefited from bulldozer architecture. then they can start pushing the software developer to do module base scheduling. Once the module scheduling mature = release the consumer version CPU.2. OR... release the fully fledged bulldozer for Opteron only. Disable 1 core on each module on the consumer CPU, disable some L3 since it has less core now. This allow AMD to salvage defective 1 core module + defective L3 and sell it as consumer CPU. Disabling 1 core on each module + some L3 also release some thermal headroom, allowing higher clock rates within 125-140wTDP. 3. follow Intel's tick-tock strategy, release 32nm 8 core Phenom II, or a higher clocked Phenom II X6/X4. buy more time for the engineers to fix the bulldozer flaws, wait the 32mn mature, then release it on a new socket specialize for bulldozer rather than restricting themselves designing the new CPU around the AM3 socket specification basically I see no good thing from AMD, they have no idea how to execute their product release and design. Even the Trinity IGP Radeon now is VLIW4, instead of making the low end HD7000 similar to Trinity IGP to easy classic crossfire, the entire low end line up of HD7000 are base on VLIW5. Crossfire diff GPU are much less mature. Another blow from AMD here.[/citation]

1. Correct.

2. Correct.

3. Correct.

I have to wonder if AMD is doing this intentionally. Maybe someone is being paid to not have a proper competitor for Intel? Who knows. If AMD truly wanted to be a proper competitor for both Intel and Nvidia, then they would have done what you said in 3, release a die shrink of Phenom II while they did the right job on Bulldozer, fixing every problem that I mentioned and more. After that, they could have also done what we've suggested, disable one core per module on some consumer SKUs, and then reap the benefits of probably more than doubling Bulldozer's performance while cutting power usage even further down.

Further along, yes, AMD should have at least made the low end Radeon 7000 cards use VLIW4 if they didn't want to use GCN. It would have been both a power efficiency boost over VLIW5 cards and better CF compatibility, among other advantages. Heck, they could have used the same GPU as Trinity's Devastator GPU. However, AMD seems intent on not being a very good competitor.
 
[citation][nom]army_ant7[/nom]@Edgar_Wibeau: Oh yeah. It does look like you're right about this. I should've put it in an algebraic formula just to be sure since Math can get confusing sometimes. Thanks! I did feel somewhat unsure andI did acknowledge that I could be wrong. Sorry for using up your time like this. Hehehe...@Blazorthon and De5_roy: Thanks for the info guys! The name "Orochi" does sound a little more familiar now and I don't just mean from Japanese mythology. Hehe... Like I've encountered it somewhere. Anyway...How does this affect Turbo Boost negatively though since all modules are active? Oh yeah! Well, I'll just explain it further in case anyone reading might wonder. Also, correct me if I'm wrong.Aside from thermals, Turbo Boost is also limited by the number of active cores/the number of sleeping cores (another way to put it). I was thinking that why not just base it on thermals, but the thought of overclocking while having insufficient voltage hit me.[/citation]

Think of it like this. When a module hits high utilization in the FX-8150, then it can Turbo from 3.6GHz all the way up to 4.2GHz or thereabouts. However, with a core from each module disabled, you now have each module, at best, running at about 60% utilization instead of topping out at 100%. So, Turbo can't go as high because each module is only being pushed somewhat beyond half-way. It maxes out at about 3.9GHz when one core per module is disabled.
 
[citation][nom]mindbreaker[/nom]Regarding Fritz Chess Benchmark: the assumption that floating-point has anything to do with chess is just wrong. "Llano assumes a lead in Fritz, pointing to Piledriver’s shared floating-point resources as a weakness in this particular test."Chess does not use floating point at all.[/citation]
I'm not saying that anything you've said is wrong, but I'm not sure if you're talking about Fritz in particular or chess software in general. Because if it isn't the former, then you do have to remember that programs can be coded in diverse ways. The makers of Fritz might've intentionally used floating-points for the sake of benchmarking floating-points. Anyway, just my thought on that.
[citation][nom]blazorthon[/nom]Think of it like this. When a module hits high utilization in the FX-8150, then it can Turbo from 3.6GHz all the way up to 4.2GHz or thereabouts. However, with a core from each module disabled, you now have each module, at best, running at about 60% utilization instead of topping out at 100%. So, Turbo can't go as high because each module is only being pushed somewhat beyond half-way. It maxes out at about 3.9GHz when one core per module is disabled.[/citation]
Oh, so it's not about putting more modules to sleep?
I'm just checking if this is what you're saying. It's about whether or not the modules actually need the Turbo Boost, or rather, that whatever measuring device it uses to read a modules utilization can only read "at best...about 60% utilization" because the disabled core carries a potential 40% utilization of whole module. Is that it? (I'm not sure if this info was in the article you gave me before. Sorry, I haven't quite read it through.) Sounds like a patch to Turbo Boost could fix this then, but alas, this mod is not officially supported. :-(

EDIT: I've finished the article you gave me. http://techreport.com/articles.x/21865/1 Just for the sake of being more sure, where have you seen how Turbo Boost works? 🙂 Also, an interesting idea is how you could force an application to use certain threads as done in the article. Hm... Do you think this could serve as workaround for Bulldozer owners who don't have mo-bos that can turn off one core per module? I'm thinking like making .bat files for their games. :-D
[citation][nom]blazorthon[/nom]I have to wonder if AMD is doing this intentionally. Maybe someone is being paid to not have a proper competitor for Intel? Who knows. If AMD truly wanted to be a proper competitor for both Intel and Nvidia, then they would have done what you said in 3, release a die shrink of Phenom II while they did the right job on Bulldozer, fixing every problem that I mentioned and more. After that, they could have also done what we've suggested, disable one core per module on some consumer SKUs, and then reap the benefits of probably more than doubling Bulldozer's performance while cutting power usage even further down.Further along, yes, AMD should have at least made the low end Radeon 7000 cards use VLIW4 if they didn't want to use GCN. It would have been both a power efficiency boost over VLIW5 cards and better CF compatibility, among other advantages. Heck, they could have used the same GPU as Trinity's Devastator GPU. However, AMD seems intent on not being a very good competitor.[/citation]
Maybe AMD is just having trouble settling in with the new management/CEO. I mean with all those job cuts, something might've been shaken up in there. Also, sometimes, things could get out of order going from division to division...person to person. It could've been an unfortunate chain of events or they might just not have thought of the ideas you and others have, in time to implement. It could be a (bad) business decision. It could've also been what you've said about being bribed to be that way. 🙂) I have read a comment somewhere before about how Intel would have government (monopoly) issues if ever AMD's CPU division died. But we shouldn't jump to conclusions.
As for them imitating the tick-tock strategy, maybe they don't want to appear like copy cats?
I just had an idea right now. Maybe they can do a tick-tock strategy with CPU's and APU's. Like release a CPU then apply a die shrink and add-in graphics and release it as an APU, then a CPU again. Haha! It sounds funny and by the looks of it, unlikely since they released Trinity first with Piledriver and haven't applied a die shrink since Llano and Bulldozer...
Hm... That's a thought. Would there be a possibility that we'd be surprised of a release of Vishera with a die shrink? They had some practice with Trinity on 32nm.
 

Thanks for that link BTW. It opened up my mind a bit about what went on in AMD, if all that were true. It makes me want to work for AMD in the future...and rise up in the ranks...and make huge changes inside. Haha! I sound like a little kid. The business people, unfortunately, seem to have more say, though the engineers could influence them I bet. Another thing is, even if I'm lucky enough to be assigned to the engineering team of whatever CPU they may develop in the future, if they still do, I'll only be one of so many (as said in that forum post).
It's really sad what's happening with AMD... :-(
 
[citation][nom]army_ant7[/nom]@Edgar_Wibeau: Oh yeah. It does look like you're right about this. I should've put it in an algebraic formula just to be sure since Math can get confusing sometimes. Thanks! I did feel somewhat unsure andI did acknowledge that I could be wrong. Sorry for using up your time like this. Hehehe...[/citation] No problem!

This would make 19% improvement in iTunes and 17% for 3ds Max on page 2 btw ;-)

Author: I'd not been reading the whole article when I made my first post, was most interested in the OpenCL part and how it compares to Llano. Summing it up now, really a decent first look at Trinity/Desktop!
 
[citation][nom]Razor512[/nom]please add other CPU benchmarks so that we may have a frame of reference.while it is nice to see this new CPU, we ultimately want to get the best CPU for the money[/citation]

This isn't a full review, just a preview. The hardware here isn't ready, so we're just getting a look into how well it is right now. There will undoubtedly be a full review that includes other platforms. Hopefully, a lot of platforms are in the full review; it would be worth the wait IMO to see a very full review and comparison/contrast against the other platforms.
 
[citation][nom]blazorthon[/nom]Further along, yes, AMD should have at least made the low end Radeon 7000 cards use VLIW4 if they didn't want to use GCN. It would have been both a power efficiency boost over VLIW5 cards and better CF compatibility, among other advantages. Heck, they could have used the same GPU as Trinity's Devastator GPU. However, AMD seems intent on not being a very good competitor.[/citation]I seriously think they should revamp the entire low end line up on HD7000 to VLIW4 to mirror what that has inside Trinity IGP. May be a discrete HD7000 that is twice the size of devastator IGP + map/fool the driver profile think of is a dual GPU, then crossfire with the original devastator as triple crossfire. Most games these days have no trouble benefiting from triple crossfire. the mobile market often have GPU bottle-necking the CPU in everyway. If AMD can crossfire their IGP + Radeon without, it will sell very well.

 
[citation][nom]tourist[/nom]by the way chris liano supports 1866 per channel not 1600 as stated in the article[/citation]

I'm not sure about this. But I do remember reading an issue where they couldn't really get it to run memory at 1866MHz. Maybe it was/is similar to the issue with Trinity now.

It's been a while already. Not that I'm rushing you Chris, but will we still be seeing that video you said you were making with the most powerful Llano and the Core-i3's? 🙂
 
I work at office depot and soon we will start selling some new HP models that have the new APUs. I know they are the P7 series (which feature stuff like Beats Audio)

One with A10-5700, 10 GB RAM, 2 TB HD -$679.99
Another I think its A10-53xx (the next step down for retail) 8 GB RAM, 1 TB HD ~$540

Pretty damn good prices if you ask me
 
[citation][nom]tourist[/nom]I think he had mobile liano on the brain because it only supports 1600 , however i have yet to confirm 2 mem sticks @ 1866 per channel. I know liano can run 2133 in single channel, but it is not officially supported . Maybe it was just a misstype[/citation]

It could be.

[citation][nom]tacobravo[/nom]I work at office depot and soon we will start selling some new HP models that have the new APUs. I know they are the P7 series (which feature stuff like Beats Audio) One with A10-5700, 10 GB RAM, 2 TB HD -$679.99Another I think its A10-53xx (the next step down for retail) 8 GB RAM, 1 TB HD ~$540 Pretty damn good prices if you ask me[/citation]

Not to lecture you or anything, though we could all appreciate the info just like any other leak, isn't it against your contract or something to reveal info like that? I know they won't be able to track you anyway, but still.
 
Status
Not open for further replies.