AMD CPU speculation... and expert conjecture

8350rocks · Jun 24, 2013

hafijur :

You do realize he just posted a benchmark showing figures for the 990 that was 31W higher power consumption right? The 980 won't be far behind it in power consumption.

Oh, look 230W peak power consumption...man that's about the same...

8350rocks · Jun 24, 2013

Found some interesting data on Core Usage from Crysis 3...seems it's aimed squarely at 8 cores...even on the 3970x it uses all 6 cores plus 2 hyper threading ALUs:

Interesting that the FX 4300 doesn't bottleneck...(barely)...but the i3 is a complete bottleneck at 99/100% usage on both cores...

hcl123 · Jun 24, 2013

hafijur :

are you kidding ? lol

*IF* that is Steamroller it has very little to do with the Hot Chips presentation. Its double FlexFPU per module the double of now the double of expected performance of now.. and PD is not far from intel with a single FPU per module, matter of fact it can win in a lot of benchmarks ( except the crippled Cinebench).

So almost double of that is something to write home about lol. And its not only the FPU, Steamroller according to that image is like 2 fully separated cores glued together, and that on 28nm ??

ummm ... or all the rumors of troubles are right, Glofo couldn't get SOI right, and the original BD was an hybrid of 45nm and 32nm (more 45 than 32)... improved with Piledriver, improved again with Richland.. or that Steamroller die is not 28nm, but more like 26 or even 22.

I vote for an hydrid 45nm... now fully developed as it should be for 28nm.

Everything could be tremendously better, but the least of all is "integer" IPC (good to know when ppl talk IPC , officially it means Integer lol)

EDIT

But then again, each AMD core in tat image seems to have 4 ALU +4 AGU, double of now... it can explode integer if the front end can keep up. But i think is more like AMD imitating intel, 8 core pipes is clearly overkill, as hasfail shows, and it will be even a bigger steamfail because each core will have 8 pipes + 4 FP pipes.. and only 30% improvement lol

No, the 4+4 might be for SMT(simultaneous multithreading) per core, it might be that having only a single thread per core doesn't use all the pipes.

Cazalan · Jun 24, 2013

8350rocks :

Is just 40% usage even considered using the core? You'd be better off parking 2 cores, and letting 4 cores turbo-clock.

juanrga · Jun 24, 2013

hcl123 :

Steamroller design has been changed in AMD slides. Look at next, with double fetch and 128bit --> 256 bit

Therefore, maybe this die shot is some new improvement of the architecture.

gamerk316 :

1. You must be confounding it with ICC, which almost "no one uses it".

2. 15% from IPC. 30% from IPC + higher clock.

juanrga · Jun 24, 2013

gamerk316 :

GDDR5 is expensive because is used in small quantities 1GB, 2GB, 3GB... in high-end GPUs with tiny market shares (only about an 0.5% of gamers have 2GB). If every kaveri chip was accompanied by 8GB GDDR5 then prices would drop massively.

I still have hopes that kaveri or some replacement APU will come with GDDR5 support in SO-DIMM format with a low latency memory controller.

jdwii · Jun 24, 2013

hafijur, may i ask why do you even care about power consumption on a desktop CPU? Is power a lot of money where you live or what i understand how 220watts is redictous but come on 125 watts is easy to keep cool on a 212+(cheap heat-sink) i'm not being rude but i just find it pointless and on a laptop Amd and Intel are good enough for the average person for power consumption my llano laptop really does last 6 hours when browsing the web and x86 tablets at my bestbuy are not even selling.

MU_Engineer · Jun 24, 2013

hafijur :

Intel really is probably near the top of what they can do with desktop performance with the i7-3770K and i7-4770K. The top Turbo speeds of those i7s are 3.9 GHz and overclockers are topping out somewhere around 500 MHz faster is all. That's a fuzz over 10% is all, roughly how much you could overclock a top-end chip before the early 2000s when chips' top speeds were determined by the clock speed potential of the process/arch rather than thermal reasons. I think that is exactly what is happening today with Intel.

Power consumption and clock speeds are determined at the architecture, process ,and platform level today rather than formerly pretty much just the architecture level. Current manufacturing processes have gotten small enough that gate leakage can be a problem and really drives up power consumption. This problem just gets worse with each gate shrink and Intel is one shrink ahead of AMD. However, leaky gates have a low capacitance and clock to high heaven (witness the Phenom II X4 TWKR.) If you want lower power consumption, you give up clock speed. Intel obviously optimized the 22 nm process for low power consumption as they believe mobile, especially low-TDP mobile like tablets and "ultrabooks" are the future. As a result, maximum clock speeds are quite a bit lower than with the older, less-power-optimized 32 nm process.

Intel also has chosen to go for minimum power consumption with the chip and platform arch as well. All of us oldtime overclockers around before bus locks (hell, or multipliers for that matter) know that the "non-CPU" parts of the system generally don't like to overclock nearly as much as the CPU. Intel has rolled the entire northbridge, most of the southbridge, and even the VRMs onboard Haswell in an effort to save power. There are bus locks to keep the "non-core" parts at normal speed (which is why you can only overclock with a K or X series unit). But these parts simply being on the chip and designed to run in a very narrow and much lower frequency range has an effect on the complexity and floorplan of the chip. AMD on the other hand uses the same platform architecture in the FXes that they used back in 2003 with the very first Athlon 64 FX. The only thing on-die besides the caches is the IMC. No VRMs, no GPU, no PCIe controller, no southbridge. The layout of the chip can be much more conducive for maximum performance.

So in short, Intel could very well make a 22 nm chip that burned 220 watts and it could be much faster than what they are currently offering. They just can't do it with anything resembling their current 22 nm process or platforms.

I can see in 2015 amd will be releasing 220w tdp chips to compete with intels 35w tdp chips thats how the two companies are going, amd crank the tdp up, intel can we go even lower and start focusing on arm cpu power envelope.

I don't really care about clock speed as it could be like the pentium m 1.3ghz that destroyed any p4 really out then. The funny thing is the intel pentium m cpu was the best out on the market then the athlon cpus then the crap p4 cpus. The pentium m cpus were like 6x better performance per watt then the p4 cpus.

The PIII Tualatin and and the Pentium Ms were good chips but they weren't faster than the competing desktop chips. First of all, the Athlon XP was roughly as fast per clock as the Tualatin, considerably better in FPU-heavy tasks (any K7/K8's FPU was a ton better than any P6/P6+ chip), and also clocked a lot higher. Ditto with the A64 vs. any Pentium M- roughly as fast per clock except a lot faster in FPU tasks, but clocks a lot higher. And that's only in 32 bit mode, the A64 was considerably faster running 64 bit code per clock than the 32-bit-only P-M with similar 32-bit code. The PIIIs and P-Ms were faster per clock than the P4 but the P4 Northwood and on were simply clocked so much higher that they ended up being considerably faster at the end of the day. The short-pipeline Tualatin and P-Ms didn't overclock that well either- you could get to around 1.8 GHz on a good 1.3-1.4 Tualatin and around 2.5 GHz on a P-M Dothan. That didn't hold a candle to a 4 GHz P4 or close to 3 GHz A64.

Anyway amd piledriver currently is like an intel p4 netburst cpu, very inefficient needs a lot more power to compete compared to a competitor cpu. i am prettyu sure the athlon 64 took like 70w less then a p4 single core cpu and basically outperformed it. Intel cpus currently are trying to outperform amd chips taking 90-100w less on peak load. Thats what innovation or amazing cpus are pushing the boundaries of moores law.

Thats why I hope steamroller gets at least 60% better performance per watt then the current piledriver cpu. I find it hard to believe how they can go really backwards on new 32nm process while intel made a gigantic leap on 32nm with sandy bridge.

Piledriver is not like a Netburst CPU. Bulldozer/Piledriver is actually much closer to something like the UltraSPARC T series in theory as it's aim was to maximize core count by sharing of some formerly unshared core resources. Netburst's goal was to try to increase performance by massively goosing clock speed and it did it by having a very long for the time period (and in Prescott, very long period) pipeline. The pipeline of BD/PD isn't that much longer than Haswell's. Clock speeds with the exception of the limited-quantity FX-9xxx series aren't that much higher than that of the Phenom IIs. Remember those got well into the high 3 GHz range and BD/PD are on a new, smaller process that should allow for better clocks. Bulldozer's problem was that it erred on the side of multithreaded > single-threaded performance being a priority and shared a little too much between cores. At least AMD realized this and Steamroller is "un-sharing" some of the resources to tip the balance back the other way some. If they were like Intel and Netburst, they'd double down and hang four int cores off of one decoder and FPU like Intel stretching Northwood's pipeline to a ridiculous 31 stages in Prescott.

If you are looking at maximum performance per watt, AMD's chip for that is not BD/PD/Steamroller but Jaguar. You don't have to design your top-line chip for minimum power consumption just to get a decently-performing low-power chip when you actually have a decent low-power chip design to begin with.

This is just hilarious. Intel main aim is innovation like low power cpu for ultrabook. Heck they even funded a lot of money into this. AMD just releasing junk on 32nm barely any better then the 45nm phenom cpu and worse then the old phenom 2 series on a lot of tasks like gflops on intel burn test I believe.

The "ultrabook" is a joke. It's essentially a non-Apple version of the MacBook Air and saddled with the same limitations- very high price, mediocre performance, and few expansion ports. The MacBook Air sells slowly and that should have given Intel a clue as to the "ultrabook" sales prospects. But I suppose the appeal of something that could try to push ASPs way up and make $300+ CPUs "make sense" was too much for Intel to resist. Meanwhile, people are buying boatloads of cheap madeinchina convertible tablets running non-x86 cell phone CPUs and costing 20% of what an "ultrabook" costs.

AMD's 32 nm is a better process than their 45 nm one. You seem to be very concerned with power consumption for mobile CPUs. The latest 32 nm APUs do a whole lot better for power consumption and performance than the 45 nm Champlain Phenom IIs ever did. Their peak GFLOPs are actually quite a bit higher as they have a built-in GPU which has a pretty high FP throughput, particularly in a purely theoretical benchmark like a GFLOPs benchmark. BTW, Intel Burn Test isn't a benchmark, it's a system validation tool to make sure the system has sufficient cooling to not fail when severely stressed.

AMD are a good 2-3 years behind. If it weren't for intel then we would not have efficient computers and haswell is the next big step. AMD bulldozer series cpu was so cap on power they had to delay it. It still is crap.

We would have efficient computers if Intel weren't around because the big driver of very low-power computing isn't Intel, it's TI, Samsung, Qualcomm, et. al with their ARM-based chips in tablets and such. Intel is a follower, not a leader, as evidenced by their laughably bad attempt at a low-power chip in the Atom and the creation of not one but two new categories of computers to try to shoehorn their chips into. The netbook is already dead and the "ultrabook" soon will be.

AMD had to delay Bulldozer as their 32 nm fab process was not ready, not because of power or the chip being "crap." The best example of a chip being delayed because it stank was the first Itanium. Boy did that ever suck and no, it did not get any better in the three years it sat around during the "we'll release it when we get our compiler working well, which will be Real Soon Now" phase.

Think of using amd as a server for a company or intel. Lets say amd systems for same performance total take 10000w. Intel for same performance take 4000-5000w thats a big difference isn't it."

First of all, the percentage difference in the amount of total power an Opteron vs. a Xeon draws is fairly small. The E7 Xeons actually are much worse than the Opterons in total power draw because the E7s are old Westmeres using FB-DIMM MTH type devices on their motherboards which just chew through the power. Most servers also have a lot of RAM and disks that draw a considerable amount of power and would be similar between the two. The cost difference between Xeon and Opteron servers are quite large, especially for 4P units. If you wanted to make a cost of ownership argument for Xeons, you won't make it on power or performance. Licensing with ridiculously expensive per-core fees like Oracle's would be a better case to make as you want to have as few cores as possible to avoid getting reamed as badly by using Xeons with identical performance using fewer cores. But then all that really says is that some vendors have stupid licensing schemes and massively overcharge.

8350rocks :

I'll believe it when I see it. Steamroller should improve single-thread performance somewhat but Bulldozer/Piledriver isn't bottlenecked when it's only running one thread in total. It's when it runs two threads on one module where it gets bottlenecked and this is where Steamroller will help a lot. But you won't see that unless you are using decently multithreaded tasks anyway (which BD/PD already do well at) or are using a completely brain-dead OS thread scheduler like Windows'.

8350rocks · Jun 24, 2013

jdwii :

hafijur, may i ask why do you even care about power consumption on a desktop CPU? Is power a lot of money where you live or what i understand how 220watts is redictous but come on 125 watts is easy to keep cool on a 212+(cheap heat-sink) i'm not being rude but i just find it pointless and on a laptop Amd and Intel are good enough for the average person for power consumption my llano laptop really does last 6 hours when browsing the web and x86 tablets at my bestbuy are not even selling.

+1

8350rocks · Jun 24, 2013

Cazalan :

The core usage on the 8350 is over 60% across the board...

GOM3RPLY3R · Jun 24, 2013

8350rocks :

Considering this, since the total amount of registered and usable cores is 8 cores on the i7-3770k (yes, including hyper threaded), wouldn't the fact that with Crisys 3 it only uses ~60% like you said with the FX-8350, make the i7 just be ~the same? The hyper threaded cores are also CORES. They are not physical, but they can be used, if you can set them correctly. If I am correct, I heard rumors of software that can manually use the hyper threaded cores and register them as physical cores, giving them more use. This may seem awkward, but as you said, the hyper threaded cores can't really be used as their hyper threaded. The reality is, even though they are not physical, they are still CORES.

With your 60% usage, there is still a 40% gap there that can be used, with hardware acceleration. Now even without this, the i7-3770k should be getting within ~2 frames up and down of the 8350, and even murdering it in some cases, as according to THIS video, which is by your personally favorite crew, Tek Syndicate. (Your Welcome ^_^)

Thus being said, with this software that was rumored, wouldn't that mean that the Hyper Threaded CPUs are ultimately superior? I could consider that a 8 core Intel Processor would be worth no less than $1000, meaning that your little FX-8350 would be far too much expensive for the money.

This was just a rumor that a heard from a good buddy of mine (can't disclose names), that works at a certain company (no name disclosure), who is making this software (rumor).

Just some food for though buddy, cheers.

noob2222 · Jun 25, 2013

^^ problem with this then your cores will be fighting for time share if the core. similar to AMD fighting for the front end. you would end up equal across the "cores" and since HT = ~120%, each core will run at 60% just like how BD runs 80% when 2 cores are active on the same module.

instead intel designed it to be 100% + ht% instead of dividing the time equally.

Cazalan · Jun 25, 2013

juanrga :

That 2nd picture looks quite photo-shopped. You can see where the tip of the 1 used to be for the 128 without even zooming in.

Anyhow wouldn't that kind of defeat the purpose of HSA? Why double up on FP pipes when you got scores of GPU pipes to use.

griptwister · Jun 25, 2013

@GomerPile: Anddddd.... as it trickles down to HD+ resolutions your whole bogus theory falls apart. Lols!

I would think Hyper Treading would be used best on a mobile device. Not a desktop PC that can use as much Power as it wants. I bet you all didn't know Hyper Threading is Intel's way of charging more for a product and giving you less performance. If intel was to design a 6 core i5 for $400, I would hop on that like no tomorrow. But there is a reason as to why they don't go up to 6 cores except for i7 CPUs (not counting server class). Intel knows it better than anyone else that Cores are more powerful than threads. Quite simply, Hyperthreading is nothing more than a marketing scheme to get your money and to save their selves some and give you the illusion to a more "Powerful" CPU.

I could almost guarantee you a FX 8350 with the same IGP as an i7 would slaughter the Hyperthreaded i7 in multi-threaded applications.

de5_Roy · Jun 25, 2013

Cazalan :

ahh... :lol:
i skipped over it before while trying to wade through the ongoing cripplefight.
after a second look, it looks hilarious. seems like someone opened mspaint, tried to 'shop the promo slide after 'expertly' duplicating it side-by-side, and all of a sudden something icky dropped on his mouse-using hand and his hand just 'slit' through those poor, poor integer pipelines. 😀 i guess he was just getting ready to 'rename' the integer pipelines after 'shopping(poorly) the fmac part. shoddy job. on an old promo slide, even. he shoulda chosen the so-called 'monster' (the one that s/a's leo yim allegedly made up).

i think total 'fusion' of gpu and cpu won't happen until excavator. berlin's promo slide didn't show anything particularly new with the architecture. i think if amd did drastically change something, they woulda talked about it in the server cpu unveiling. my issue was that there wasn't even a mention of a six core(tri-module) opteron cpu. either huma and hsa are doing much better than higher core opterons or amd won't talk about higher core cpus until much later... or something else...

@juanrga: try to look closely next time, that image's steamroller design(!!) has been changed by someone else other than amd. (rest of the attempted sarcasm snipped off).

even if there's an octocore sr-fx in the future, amd sure as heck won't poorly 'shop(or mspaint) it and release it in public.

btw..er... that was not a die-shot. i am a newbie but i at least know what a die shot looks like. there was no purple, yellow stuff!

noob2222 · Jun 25, 2013

Since your sooo stuck on efficiency, its a good thing AMD doesn't have anything that can even come close to competing with Intel ... well at least not till ~September.

The reviewed part A4-5000 may not be the fastest, but that was to be expected since its 1.5ghz vs 1.8 and 2.2, but with lower power draw, its just as efficient as Intel's Ivy Bridge UltraLowPower mobile part

The A6-5200 should be a bit more competetive when it comes to performance, 2.0 ghz instead of 1.5. Could be interesting also to see how amd's 3.9W part shows.

oh thats right, AMD was only going for higher power draw systems.

de5_Roy · Jun 25, 2013

*facepalm*
edit: after posting i realized that this is the first post in the 100th page... sigh..

noob2222 · Jun 25, 2013

hafijur :

so your focus is ULP mobile for gaming? that explains a lot.

I think what your looking for belongs here: http://www.tomshardware.com/forum/forum-34.html

juanrga · Jun 25, 2013

Cazalan :

Touché

. However, AMD has already introduced 256 bit on the CPUID Programming Guide. The old version

The updated version

http://support.amd.com/us/Embedded_TechDocs/25481.pdf
http://support.amd.com/us/Processor_TechDocs/24594_APM_v3.pdf

gamerk316 · Jun 25, 2013

mayankleoboy1 :

There's a reason for the conservative vectorizer: Being too aggressive in this stage tends to break things in VERY unreproducable ways. As a general rule, compiling with /O3 is discouraged for this exact reason.

8350rocks :

As has already been explained, the Crysis 3 devs made the decision to offload processing from the GPU to the CPU, hence the sky high CPU usage numbers.

And the irony here, is despite the loading numbers of thte 2600k, which look poor at first glance compared to FX, is that both SB and FX pull the same exact performance numbers, which is a strong indication of a GPU bottleneck, despite how "well" the game scales.

Touché . However, AMD has already introduced 256 bit on the CPUID Programming Guide. The old version

The updated version

http://support.amd.com/us/Embedded_TechDocs/25481.pdf
http://support.amd.com/us/Processor_TechDocs/24594_APM_v3.pdf

So, these enhancements will likely be introduced in MSVC 2015 or so. Understand that no one manually drops in X86 opcodes anymore, its all done by the compiler. Hence why I laugh every single time someone claims a few new optimized opcodes is going to lead to across the board performance enhancements, that NEVER come to pass [see: AVX].

de5_Roy · Jun 25, 2013

interlagos has nothing on this 😛
http://vr-zone.com/articles/home-built-8-bit-pc-has-16-cores-excels-at-multitasking/40859.html

Sony's PlayStation 4 Is Running Modified FreeBSD 9
http://www.phoronix.com/scan.php?page=news_item&px=MTM5NDI

Seiki launches 39-inch 4K TV, Ultra HD now costs just $699
http://vr-zone.com/articles/seiki-launches-39-inch-4k-tv-ultra-hd-now-costs-just-699/40773.html
Imagination Technologies Develops HEVC Video Encoder
http://www.xbitlabs.com/news/multimedia/display/20130624181040_Imagination_Technologies_Develops_HEVC_Video_Encoder.html
the reason for posting 4k related link is that this is the new video standard that future apus and gpus will follow. afaik, some of the current igpus and most of the discreet gfx already have 4k support.
and cuz it's been a slow amd news day.

sarinaide · Jun 25, 2013

[/URL][/img]

BLACK: No symmetry in the spacing
RED: Apart from the WTF does that achieve again its just an eraser job
YELLOW: Very poor editing technique, cells don't line up, blurry and just poor in general.

Its all pretty fake and a bad fake.

jdwii · Jun 25, 2013

I sure hope that guy has fun with inferior drivers, also wouldn't be more smart to plug in your laptop/smartphone/tablet whenever it gets low? i'm going to repeat this sense it needs to be said over and over power inverters are 20$ if you're in the car a lot, if you're in school or college they allow for you to plug it in, hotels are the same, unless you're on a airplane ALL the time who cares i mean honestly 6 hours of battery life is amazing i bet a A10 Richland APU will last longer and game better than any Intel CPU per dollar. Intel is like Apple when it comes to the name tax what i mean by that is this i can get a A10 Laptop for 550$ right now. And for 750$ i can get a I7+discrete video card laptop right now, and on the low-end i can get a A8 laptop for 399$ so Intel is gonna have to price their parts extremely close to Amd their GPU is still slower.

hcl123 · Jun 25, 2013

hafijur :

c'mon that is no "prove" of anything... its only a AT preview approximation, and accounting margin errors i would say the results are pretty close for AMD and intel (edit).

You wont tot get a better view, grab at least 10 different reviews(go to less knowned sites) from 10 different sites and make an average, you'll see quite a good deal of fluctuation, you'll see the averages tends to shrink differences then you'll give me reason.

juanrga · Jun 25, 2013

sarinaide :

Yes it was already pointed before. The only true is that AMD adds 256-bit FP in the CPUID Programming Guide.

AMD CPU speculation... and expert conjecture

Distinguished

Distinguished

Honorable

Distinguished

Distinguished

Distinguished

Splendid

Splendid

Distinguished

Distinguished

Honorable

Distinguished

Distinguished

Distinguished

Splendid

Distinguished

Splendid

Distinguished

Distinguished

Glorious

Splendid

Splendid

Splendid

Honorable

Distinguished

Share this page