Phenom vs. Athlon Core Scaling Compared

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.


But if that happens... it will make many people in this very forum cry and complain.

Oh wait... they do that now and they'll continue to do that. Nevermind. Carry on.
 
Caveira209 9

Nice find thats exactly what we have been talking about. The biggest improvments to Phenom would come if the IMC clock is increased and the L3 cache removed and replaced with larger L2.
 
I'm convinced now that Phenoms performance hinderances are actually due to memory controller speed.

From what I can tell after installing the Phenom, HTT speed is tied to the IMC's speed. In this case for retail phenoms its tied at 2x the mem controller clock. 1.8Ghz Mem controller for 3.6Ghz Hyper transport speed. This is according to the bios on my K9a2 platinum. And also displays this way in AMD Overdrive utility.

Since installing the 9600 BE There are new options that have appeared in the Cell menu in the bios. About 4 for the cpu core, and 4 or 5 for the Northbridge/IMC. Once I have the time I'll post what they are, cause I'm gonna need help interpreting them. So I'll likely not try to OC until I get those figured out. Tried to mess around with AOD but it didn't work out properly, has some kind of issues setting the voltages. Though that may be due to the fact I originally installed with the x2 4200+. I also can't for the life of me find the option that allows you to disable the TLB patch, though not sure if that comes default on the 1.1 bios on my board.

I'm also coming to believe that the TLB erratum is probably what is causing them to limit the IMC/NB speed to 1.8 instead of full core speed. I believe if the IMC were running at core speed, the phenom may be able to give a core2 quad a run for its money.
 
I read the article and actually used my windows calculator to calculate theoretical results between 2.2 and 2.8 ghz cores. Which should be 1:0.785
Looking at the figures then makes me wonder if its not just the adulthood of the athlon core and the chipsets and bioses, drivers, compilers which causes a more steady scaling. If every time there is a small loss compared to theoretical scaling this is evidence of soms lack of horse power being applied to the application this could badly tuned drivers, bios and chipset. Considering this, i found that also many tests score better than the theoretical scaling which then makes me consider if the extra horse power proves windows is an inefficient os to do such tight tests. Since extra services and daemons running in background take a consideral amount of time during their routines. The memory controller would only bump up these figures on higher clock speed for the Phenom because it has a higher bandwith. And 1:1 performance scaling works as long as the bottleneck is the pure processing power. And in most these synthetic tests the phenom reaches a tad more or less then 1:0.785. Which i consider to be as expected. And the small difference between the athlon and phenom core i consider to be adulthood as ofcourse the phenom will produce better results as it is clocked higher and brought under circumstances when lets say HT2 vs HT3 will become relevant, obviously. And ofcourse in multicore environments the phenoms better architecture will whipe athlons butt. But since this article is a 1 core adventure my guesses are the small performance deficiencies indicate adulthood.

Luckilly i have all the time in the world to see Phenom peak out and sadly AMD has not. So lets hope B3 at 3+ghz and better support from motherboard manufacturers will bump AMD next to intel, so we get an equal "price" scaling across the board. As i hate intel for selling their qx6400 for $200 and their extreme for $2000 and its only 500mhz faster. I get more mhz in a couple minutes of clocking and tweaking.
 
I read some of the discussion here about intel or amd fanboys, i use both and i am indifferent to platform. They all perform too marginal for my needs. I don't have batch tasks to give to my workstations all day. I like my foreground application(s) to run smoothely and my background tasks to finish at the end of the day not hindering my foreground apps. Both intel as amd deliver processors which can fulfill some of my demands and there is no bias in this. If it comes down to shear pleasure of a game then cpu is less important and videocard is everything. This is my penny for a thought. I never seen a quadcore beat a highly clocked dualcore in any game when good videocards are used. So if this article is about processors then it might be about people who are going to use the processor power and i agree with caveira that Toms has to be impartial. If you compress Divx all day long you'll notice that intel is faster and if you do number crunching you'll get more out of AMD without a doubt. If people at toms think all you buy a processor for is to play crysis then i admit, get a core2 and push it to 3.4-3.7 ghz. If you want to be able to compress video while downloading and decompressing very large files, whilst playing a game and not noticing any degree of stuttering or performance loss i'd go for amd, because it has huge memory throughput where intel won't let you rape its fsb like this without stuttering. But this article is not about just 1 app but about a platform.
 


Your right the cpu forum is a bit of a playground most of the time
 
Good article.

I notice the Windows Performance Index is used.
Is the Windows Performance Index a REAL benchmark?
Not just looking up model numbers?
So a high Graphics Index will always mean a high 3DMark06 or game FPS (CPU & RAM independent) ?

If the WPI is a real benchmark, then I can take it more seriously.
Even though this should be updated so a 5.9 should mean good performance in Crysis for example. Will there be a 6 and 7 index soon so current high-end games can better display what's needed for min-requirements, recommended requirements, and max requirements.
Also for DIVX and 3D studio can use this.
 


Actually, there is no need to remove the L3 cache, nor to make L2 larger. The need is to reduce L3 latency (which is exactly the temporary "correction" given by the TLB updates of B2 stepping).

enewmen, if you liked it, ok, it is your opinion. But I trully missed the point that could make me conclude the same as you...
 



Interesting points. By the way, how many people have these Phenom clusters in their homes?
 



I think the author did a great job!! He set out to compare PHENOM to ATHLON and did just that. He found tested to see what asingle coreof each could do - and how well it scales from single to dual and then to Quad. Job well done.

What does Opteron (Barceona) or Xeon have to do with this article? NOTHING!
 



An Intel Q6600 will murder Phenom 9000 chips in 90-95% of desktop applications. AND it's 65nm - imagine what those 45nm will do to Phenom!
 


If AMD removes the L3 they will need to share the L2 between cores.

Whether or not the size needs to be increased would depend on how well it runs.

After a cache reaches a certain size if you still see speed improvements by going even larger... then you need to redesign your cache code... because something is not working properly. I would much prefer a small cache with efficient code... to a really big cache that is not as efficient.
 
Not "upper limit of efficient cache". Larger caches are good, but useless with low memory bandwidth, which can become the main bottleneck.

As an example, CELL BE (imo the best processor around, and a "lobotomic" version is used in PS3's) uses only 256kb of (some sort of) cache-like memory to every of its cores and is something like 5.5 or 6 times faster than xeon 54xx (45nm) with 12mb cache.

Again, it is all a matter of chip design - Intel's current x86 implementation has a very poor memory bandwidth compared to the core processing units.

Just to add --> intel uses 6mb of cache per pair of cores (it is wrong to count them together), and it is a "brute force" solution, in a sense that a larger program can be entirely allocated in a fast access memory. If this is a serious performance issue, one could re-design the chip (what AMD did when introduced Barcelona) instead of using more cache (what intel is making with C2).
 


I'd say that 512k-1Mb per core should have been approaching the limit of an efficient cache. If it didn't... then there is something wrong with how the cache was coded and it needs to be fixed. Otherwise they are using the cache as something other than the purposes of a cache.

Bad example... but something to consider:why do many performance hard disks have 16Mb of cache and not 512Mb or 1Gb? Perhaps because the added memory didn't speed it up because they have efficient code and don't need more memory. (I'm sure they tried more memory and ran into diminishing returns.)




Gee... do you mean when Intel changes to a monolithic core with onboard memory controller... adding more cache won't really help them as much as it seems to now?
 


Not that bad example: caches are used in a way to hide latencies and to improve error-free instructions, and hd's now equip cache for this reason.



I have already given an answer to this:



Intel with on-CHIP IMC is a chip re-design strategy, not simply get a known implementation and add more memory.
 
My opinion is the L3 idea was bad. You can see from memory performance tests that the Phenom is definately slower. This is because an additional layer of cache requires more clock cycles to pull data in from main memory, and even from the lower L3 cache. This is one of the many reasons why the Phenom's performance is lower than expected. If AMD had figured out a L1/L2 cache structure only with L2 being fully shared, we'd seen better performance over all.
 


1:1 scaling is only possible when raw processing power is the bottleneck. Since the barcelona 9900 can easily get 10gb/s memory bandwith, i see no problems in l3 cache system. amd claims the following : Up to 27.2GB/s total delivered processor-to-system bandwidth (HyperTransport bus + memory bus) This is no joke.

In my opinion, but prove me wrong, its simple adulthood. 2.8 -> 1 : 2.2 -> 0.785. theoretical scaling.

If you look at the scores you see athlon performing more than 1:1 as clockspeeds go up. This ofcourse is impossible, its an indication that other tasks get less overhead and more % is available for the benchmark. which indicates brilliant scaling!
Phenom scores a little less on average as clocks increase. This is just a wild guess. But i think its not the memory bus or hypertransport bus. I think its bad bios, drivers and leastly software optimisation. While software optimisation should not be counted as we're comparing the same silicon in the same conditions running the same benchmark, drivers should be counted as we've changed platform and early drivers can in many cases be released in a safe/unfinal state, not being able to release the full potential of processing power.

You can't expect any motherboard supplier to do things in a week which would take months for Sun, HP, IBM or Dell. For example look at ATI Radeon vs Nvidia Geforce most of these cards were released with very poor drivers (or even unfunctional). Toms Hardware at some point even rejected to test these cards as they didn't even had decent drivers and tomshardware was not capable of nor willing to test unfinished products with faulty or non functional drivers. I can't find the article in toms archive, but i know it's there. titled something like game over ati and nvidia, with the author clearly stating they will not accept new unfinished products to get a good review and make ati's or nvidia's cards sell because they have the performance crown.

you get my drift. This article to me is an indication that things will improve. And the scores in this article were quite to my expectations. I don't expect a child to be better optimised than an adult. and i don't care for the crown if thats what you ask. i'm no fanboy. Neither should toms be interested in these strategic and economic schemes from intel or amd.
 
Perhaps the cost of adding mre cache far outways the cost of re-endineering the chip. After all, they are a business trying to mke profits. AMD is a business trying to innovate. Which would you invest in?
 


1st: Job well done with wrong benchmarking? To commit such mistakes the author certainly doesn't understand much of what he is doing with threads, so how can he successfully measure scaling? Job WELL done?? Wow....

2nd: Did I really mention Barcelona or Xeon? Don't you know barcelona is the same as phenom core, and their behaviour is the same? Hmm, maybe you share the same ideas of the author...



Did you read the posts? Understood them? Doesn't seem so... No comments.
 


Well, preliminary results are in. I won't be able to do actual stability testing on an overclock until probably next week.

My windows install has gonna flakey after the processor upgrade, even when it's set at it's standard speeds. Not a suprise, since the X2 4200+ Windows and the Phenom 9600 BE use different HTT frequencies. Not to mention, AMD Overdrive has it's issues too. Been playing around with that a bit too, had to find the latest version though for it to have black edition support. That being a beta version well, you can figure that out on your own as far as how well that goes.

The Northbridge speed multiplier on the 9600 BE is indeed unlocked. It can be adjusted through the bios on the k9a2 platinum, Bios 1.1. It can be adjust from x1 to x13 multiplier, so 200-2600Mhz on the IMC clock. Core multiplier can be set in bios, but I for the life of me can't figure out how to use the setup on the 1.1 bios.


If you have a mobo with a bios that allows regular changing of the multiplier for OCing use that, AOD is an interesting untility and gives you a soft way to OC. But, it has it's issues. One being if you use it to OC, the oc won't stick unless you tell AOD to start up with windows. Which leads to the problem of, if the OC doesn't work, your computer will crash every time you enter windows. I don't know if I'll be able to play around with OCing it much until MSI releases the newer bios that has multipliers and regular OCing stuff instead of what it currently has.

On an interesting note, I did manage to use AOD to OC the cores to 2.8 Ghz. It would run Benchies like Sisoft and pcmark/3dmark/PCWizard 2008, but if I tried to run two instances of Orthos it wouldn't keep one of the instances going.

Main problem I had with changing the IMC speed through the bios seems to be more just basic windows instability after proc/nb speed change than processor instability. (part of the reason I haven't overclocked since the p2 300 sl2yk to 450 days). Every time you'd change the FSB speed it would make windows really unstable, needing a reinstall, same types of things appear to be happening here. I believe I should be able to get it to run stable at 2.8core speed after a reinstall with a suitable bios, I was having absolutely no heat issues while playing with that on AOD, never went above 30c per core under full load according to coretemp, so probably around 40c under load for the total proc, which isn't bad for an oc on one of these.

I also had it running at 2.5Ghz Core/2.4Ghz NB-IMC, it ran through Orthos x2 for 15hours without any hickups as far as I can tell. Only had problems with windows bugging on me, after that.

And thankfully after reading the review on Anandtech I'm glad I chose the Nirvana 120 for my cpu cooler.
 


More cache will always be faster than less cache (assuming it doesn't increase latency), but the rate of improvement will drop as the cache size grows... increasing cache size from 4MB to 8MB probably won't be as significant as increasing from 128k to 256k.

Current access times for off-CPU memory are horrible compared to cache access, and if Intel can afford the silicon real estate required to put more cache on there, why not use it? There's no reason why '512k-1MB per core' should magically give the best performance.

And your 'bad example' is a very bad example.
 
I have found an interesting article with has a comparison between a retail Phenom NB 1.8ghz and an ES Phenom 2ghz. The results are really odd though. The article is actually a review for a Phenom BE but unlike THG they took into account the NB.

From article:

Since the processor was retail specification, we did decided to test it to provide some accurate results in our CPU benchmarks so you can see exactly what effect the extra 200MHz northbridge frequency has on the results, and how a retail Phenom 9600 CPU will actually perform.

http://www.bit-tech.net/hardware/2008/01/17/amd_phenom_9600_black_edition/2
 


Cheers mathos

As a general rule with the exception of ram I always format and reinstall windows. Sounds like its being a right pain. Read the review I posted on here they were having most of the same issues.