AMD CPU speculation... and expert conjecture

juanrga · Oct 30, 2013

8350rocks :

My understanding is enough to detect your nonsense.

8350rocks :

This is the second time that I read your nonsense. The question is: do you read? I don't know what is more funny, that you write "GPU" or that you don't understand that the OS runs on the jaguar cores in the APU and that the secondary chip in the PS4 runs background tasks to the game, such as decoding, downloading, social stuff...

8350rocks :

Hum, let us see you claimed the PS4 runs at 2.75GHz, but that was plain nonsense. Sony did claim 1.6Ghz; therefore, you are off by more than 1GHz or about 72% of error.

8350rocks :

No. You said us it was SOI. Someone wrote:

I believe that the 28nm process used for SR and Kaveri will be bulk but I'm not sure.

And your answer was:

One thing is that you are wrong, but another is that you pretend now to hide what you wrote during months.

8350rocks :

No. You said us that Steamroller FX is AM3+ socket. Someone asked "8350 or wait for Steamroller?" and your answer was:

I wonder how many newbies followed your advice, believing that you know what you are writing.

8350rocks :

There is no AM4 sockect mobo on Foxconn. I already explained there is a mistake in their indexing of an ancient mobo, but please ignore facts and continue fantasizing. It is much more funny.

8350rocks :

At least now you make clear that you are guessing.... lol

256-bit FMAC units don't make any sense for me, but of course AMD can prove me wrong.

juanrga · Oct 30, 2013

-Fran- :

I don't think so. Both consoles have been designed to offload the intensive computations to the GPU. Games developed for them will run the physics, AI, and other tasks on the GPU. Porting to the PC you want a APU with a strong GPU at compute, no one with a strong unused FPU in the CPU.

Look at the PS4: the GPU has 4x more compute queues than a ordinary 7970/7990 and can perform both rendering and compute without a big performance penalty. I would hope something similar for Kaveri.

gamerk316 · Oct 30, 2013

con635 :

Draw call overhead is a major problem with both OGL and DX. However, that's the price of abstracting everything out to easily support multiple hardware vendors. DX11 helped somewhat, but the overhead still exists. That being said, lets see how the console GPU's look 5 years down the road before proclaiming the PC/DX/OGL dead.

And I was amused by the instance where he talked about threading too much and starving out the main NVIDIA driver thread, a cautionary tale about over-threading if there ever was one.

gamerk316 · Oct 30, 2013

Interesting read here:

http://spectrum.ieee.org/semiconductors/memory/software-controls-cache-memory-to-speed-cpus

Letting the OS manage the CPU cache doubles performance and reduces energy consumption by 72%.

gamerk316 · Oct 30, 2013

I'll just post this:

http://www.techspot.com/review/733-batman-arkham-origins-benchmarks/page5.html

de5_Roy · Oct 30, 2013

^^ "the game is paid by nvidia, ofc it will not favor amd cpus." said a c.a.l.f.

designasaurus · Oct 30, 2013

gamerk316 :

So if you run a Titan at 1080p, probably any 2+ core cpu from the last 6 years will run Batman at ~60fps average at those settings. Great? Or is the point that any Piledriver cpu with 2M+ will max out a 120Hz monitor? My point is that it'd be nice if you could elaborate how this relates to Steamroller (or any of the many off-topic topics in this thread) rather than just posting a weird benchmark.

blackkstar · Oct 30, 2013

juanrga :

That was me who (kind of) confirmed it with a different username in a different forum.

I took it apart with IDA disassembler. I didn't look into it too much but it didn't ask for libguide40.dll, which is an Intel performance library for handling OpenMP.

Basically, the important parts of CB 11.5 that dictated how well threaded things were going to be were not only compiled with Intel Compiler, but also was a library written for Intel CPUs. CB 11.5 is far more biased against Intel than people realize.

You can also look at some numbers and notice how the gap between FX and IB gets smaller as you move to r15 from r11.5.

http://cdn.overclock.net/0/0e/0ea1dc89_iy377a.jpeg

Also, rumors of SR not clocking well at all and being on BULK instead of SOI.

PD 40% higher clockspeed, SR 30% higher IPC.

If this is how it is going to be I understand why there are no HEDT SR parts, AMD has no where to make them that will work. I simply don't understand though why they would do this and not stay on GloFo. I would imagine 32nm would be better if they could get 40% higher clocks out of it. I don't see how it will be a 100w chip just like A10 6800k while having much lower clocks and still be 100w. The GPU?

Regardless I'm somewhat prepared for something bad to happen from AMD in regards to SR. It will be a kind of "SR is a really great architecture but it's on Bulk so it doesn't clock well."

Perhaps it will at least serve as a wake up call to all the "MUH IPC!!!" Intel guys who think IPC is the only important measurement (besides power consumption) of HEDT CPUs.

I just really, really hope I'm wrong about this and that there's something wrong with the information we're getting.

8350rocks · Oct 30, 2013

juanrga :

1.

•Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."

"The reason so many sources of compute work are needed is that it isn’t just game systems that will be using compute -- middleware will have a need for compute as well. And the middleware requests for work on the GPU will need to be properly blended with game requests, and then finally properly prioritized relative to the graphics on a moment-by-moment basis."

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=2

Maybe now you understand what GPU means...because the GPU is not an additional ARM core...

2.

RISC runs differently, and yes x86 CPUs have had underlying RISC architecture characteristics for a long time. Since the Pentium days really; however, as was pointed out by Palladin earlier, x86 allows MUCH larger instructions to be converted into bytecode to be run on the CPU itself.

The added complexity of x86 was my point. That is easily it's greatest strength while also being an inherent weakness.

ARM's strength and weakness both lie in it's simplicity. That's why it excels for low power devices and simple things like Micro Servers. It's also the primary reason it's not a terribly valid Desktop uarch option.

In order to make ARM a serious x86 competitor, you would have to add several layers of complexity. That complexity would drive up transistor counts and die complexity, requiring quad channel memory controllers, and many other things that draw more power. Once you've done that, you have an x86 competitor...that no longer consumes power like a mobile/low power solution. Because the added complexity draws more power, your consumption numbers spike upward dramatically.

x86 ISA has been dealing with this in it's architecture since x586 (K6-2 days). ARM has not had the benefit of time spent tweaking their architecture for such added complexities, and it takes a *LONG* time to get that stuff right. Which is part of what we see in the current AMD and Intel uarch's, Intel has a far more *REFINED* uarch, because it hasn't changed dramatically since P4 days. Where as AMD's uarch is only 2-3 years old at this point, and not nearly as well refined and tuned.

So, what you're talking about with ARM taking over DT, or even a large share of notebooks, etc., will not come about for quite some time.

The reasons for this are simple:

1.) Most of the consumer DT world runs on Windows, like it or not, M$ still has some clout. They don't want to redesign their OS entirely for ARM, and WART is a terrible execution.

2.) With no major OS player taking ARM seriously any time soon, hardware advances will come slowly because it's not in high demand. Only way ARM becomes a big player is if some large player in the PC world backs it and pushes hard. AMD making Micro Servers using ARM is not that push into the consumer sector you expect. It's a gimmick to say "see, we can do low power better than Intel", nothing more.

3.) Without a Major OS player backing ARM, the development of consumer software will be slow. Open source groups may do something, but how has that worked out for Linux so far? Outside of Android, it's still not terribly popular as an OS in the PC world considering roughly 3% of the world is likely running it on a DT PC. I think Linux should see more use than it does, but in the consumer space, M$ is still king.

4.) As ARM adds complexity, it will add power consumption, and the more you need to be able to ask the ISA to do, the more convoluted the hardware, middleware, software becomes to do those things. So, as the hardware adds complexity, the power draw increases. Once you get ARM running at a 50W+ TDP, x86 becomes a clear winner. Which is what would happen with a billion transistor ARM chip running at 4 GHz.

So, you may not like what I am saying, and you may disagree entirely. However, your declaration that ARM will rule desktop any time soon is a mere pipe dream...much like Acorn was back when it first started. That's why it's a non-profit organization that designs cores and licenses them, not an organization making chips and selling PCs.

That was my CISC vs. RISC rant...If you don't understand what I am talking about then that is your own fault for not knowing the difference between the 2.

3. PS4 runs @ ~2.0 GHz, which is 25% higher than the 1.6 GHz initially claimed. I also stated that there were ES's of the custom chip run @ 2.6 GHz, and there were.

(6) PS4 APU ES's have been benchmarked in the 2.4-2.6 GHz range already. I don't anticipate they will necessarily end up at 2.6 GHz, though I think 2.2-2.4 GHz with a turbo core for less threaded situations is entirely feasible.

http://www.tomshardware.com/forum/id-1721986/amd-cpu-performance-increase-nextgen-consoles-release.html

4. Look at how old that thread is...seriously? We had less information then, than we do now...and we don't know much now. That was purely speculation based on expectations of current trends continuing from FX. You had some posts from 6 months ago saying things that we now know to be different from reality as well. I am not going to waste time digging them up...but they are there for sure.

5. Are you dense? Read what I wrote...it was determined that it was not, in fact, AM4 MB before you ever wrote me back. Blackkstar and I came to that conclusion before you ever replied to me.

6. Now that I have corrected all the BS that you keep saying I said, what have you got left to accuse me of saying? Nothing.

tracker45 · Oct 30, 2013

""Relatively speaking if AMD FX Bulldozer is 100% (Piledriver 103% )than the Kaveri APU has an integer score of approx 135%.""

Is this true ?

8350rocks · Oct 30, 2013

juanrga :

-Fran- :

I don't think so. Both consoles have been designed to offload the intensive computations to the GPU. Games developed for them will run the physics, AI, and other tasks on the GPU. Porting to the PC you want a APU with a strong GPU at compute, no one with a strong unused FPU in the CPU.

Look at the PS4: the GPU has 4x more compute queues than a ordinary 7970/7990 and can perform both rendering and compute without a big performance penalty. I would hope something similar for Kaveri.

I just talked about this as a valid point, and you told me exactly what you're saying is wrong.

Try reading my posts...instead of proverbially putting your foot in your mouth and blindly replying next time.

de5_Roy · Oct 30, 2013

moar market shaer
http://www.techpowerup.com/193532/amd-to-reach-40-percent-of-global-gpu-market-share-in-6-months.html

LOL data within the cloud is unencrypted
http://news.cnet.com/8301-13578_3-57610061-38/nsa-taps-into-google-yahoo-data-clouds-can-collect-data-at-will-says-post/
..
....
whoops...

jdwii · Oct 30, 2013

youcanDUit :

de5_Roy :

NSA is going to far, they really need a reality check

noob2222 · Oct 30, 2013

blackkstar :

juanrga :

That was me who (kind of) confirmed it with a different username in a different forum.

I took it apart with IDA disassembler. I didn't look into it too much but it didn't ask for libguide40.dll, which is an Intel performance library for handling OpenMP.

Basically, the important parts of CB 11.5 that dictated how well threaded things were going to be were not only compiled with Intel Compiler, but also was a library written for Intel CPUs. CB 11.5 is far more biased against Intel than people realize.

You can also look at some numbers and notice how the gap between FX and IB gets smaller as you move to r15 from r11.5.

http://cdn.overclock.net/0/0e/0ea1dc89_iy377a.jpeg

Also, rumors of SR not clocking well at all and being on BULK instead of SOI.

PD 40% higher clockspeed, SR 30% higher IPC.

If this is how it is going to be I understand why there are no HEDT SR parts, AMD has no where to make them that will work. I simply don't understand though why they would do this and not stay on GloFo. I would imagine 32nm would be better if they could get 40% higher clocks out of it. I don't see how it will be a 100w chip just like A10 6800k while having much lower clocks and still be 100w. The GPU?

Regardless I'm somewhat prepared for something bad to happen from AMD in regards to SR. It will be a kind of "SR is a really great architecture but it's on Bulk so it doesn't clock well."

Perhaps it will at least serve as a wake up call to all the "MUH IPC!!!" Intel guys who think IPC is the only important measurement (besides power consumption) of HEDT CPUs.

I just really, really hope I'm wrong about this and that there's something wrong with the information we're getting.

CB 15 made the gap on IB vanish all together since the 8350 is faster than the I7 3770k.

As for that kaveri clock speed ... fk that. I would hope thats the laptop silicon and not the DT part. If it is DT, no way is kaveri going to catch any I5 cpu, would be lucky to get past the I3 530. The modular approach was based on high clock speed, if you remove that, you have nothing.

This is why bulk would be a stupid move.

Lower TDP and temperatures was one of the advantages of SOI.

juanrga · Oct 30, 2013

de5_Roy :

youcanDUit :

wouldn't be speculation if we knew for sure, would it?

the first page of this thread has a few links and more info scattered throughout the thread if you have time and effort(Lots of it) to look for them.

what we know for sure:
steamroller microarchitecture will be available in apu form first. kaveri apu (new a10 and a8 with new model numbers) will be the first amd product to feature SR cores. revealed kaveri apus have 2 modules capable of executing 4 threads with new amd integrated gpu based on gcn uarch. it will also have amd's hsa enhancements that are supposed to be helpful in certain workloads. if you're seeing unfamiliar acronyms, look them up in amd website or google.
desktop kaveri will fit on socket fm2+, a socket specific for kaveri due to it's redesign. socket fm2+ supports older trinity and richland apus. however, older socket fm2(notice the lack of '+') motherboards won't be able to run kaveri apus. amd has introduced a new motherboard chipset for kaveri - a88x.
kaveri will feature pcie gen 3.0 support with it's integrated pcie controller (on die).
amd's foundry partner global foundries will make these apus on their 28nm high performance node (forgot exact designation).
amd has revealed opterons based on berlin apu which is eerily similar to kaveri specs, except with server-specific features.

what we don't know:
cpu, igpu clockrates, type of substrate, real world performance, hsa's effectiveness, price, skus, global foundries' yields etc.

Excellent resume. A pair of remarks. Kaveri has been announced also in 1M (2-cores) configuration. Berlin has been announced as both APU and CPU versions. It is highly likely we will see also a CPU version of Kaveri a la Athlon.

juanrga · Oct 30, 2013

gamerk316 :

You found another technologically outdated game that ignores extra cores: FX 8-core, 6 core, and 4-core perform very similarly.

And not to forget this is a Nvidia sponsored game.

-Fran- · Oct 30, 2013

juanrga :

Looks like they just re-compiled the engine to support AVX now, haha.

Take a look at the FX 4100 and the Phenom II 980 and the big jump the FX4320 is over both of them.

Cheers!

8350rocks · Oct 30, 2013

gamerk316 :

Looks just like any other game that runs on Unreal 3 Engine...like Bioshock Infinite for example....

I am sure there was some sort of point you were trying to make by posting a benchmark to an NVidia sponsored new game, made on an 11 year old engine. I am not sure what that point was...but...I am certain you were trying to prove something.

Perhaps that popular engines from the days when K8 architecture was around, and quad cores didn't exist, still run on 2 cores mostly...like they always have, and always will...?

-Fran- · Oct 30, 2013

Ah, forgot to put this.

http://www.phoronix.com/scan.php?page=news_item&px=MTQ5OTA

There's the influence of AMD in LibreOffice, finally.

Cheers!

gamerk316 · Oct 30, 2013

designasaurus :

Shows a point I've been making for about 4 years now: More cores != more performance. Also shows relative scaling between various chips, and the continuing trend of Intel outperforming AMD. FX-8350 trading blows...with an i3 3220. Heck, even games like BF4 have higher tier i3s being competitive with AMD's mid-range chips.

I'll say it again: If AMD doesn't get its IPC problem fixed, they are going to go away.

noob2222 · Oct 30, 2013

tracker45 :

This is all more calculations based on the cosmology "leak". The thing is we don't even know what cpu it actually is, what clock speed it actually is, or anything else about it.

http://cosmologyathome.org/show_host_detail.php?hostid=187215

http://citavia.blog.de/2013/07/02/amd-kaveri-engineering-sample-sighted-in-the-wild-16196102/

based on guessing at what this engineering sample might be.

The thing is, we don't even know how this blog even came up with their 1.8 ghz BD & PD figures since the cpus don't clock that low. They did nothing but post a pretty little chart with some numbers in it that may or may not be accurate. (kinda like juan's website)

juanrga · Oct 30, 2013

blackkstar :

juanrga :

That was me who (kind of) confirmed it with a different username in a different forum.

I took it apart with IDA disassembler. I didn't look into it too much but it didn't ask for libguide40.dll, which is an Intel performance library for handling OpenMP.

Basically, the important parts of CB 11.5 that dictated how well threaded things were going to be were not only compiled with Intel Compiler, but also was a library written for Intel CPUs. CB 11.5 is far more biased against Intel than people realize.

You can also look at some numbers and notice how the gap between FX and IB gets smaller as you move to r15 from r11.5.

http://cdn.overclock.net/0/0e/0ea1dc89_iy377a.jpeg

Also, rumors of SR not clocking well at all and being on BULK instead of SOI.

PD 40% higher clockspeed, SR 30% higher IPC.

If this is how it is going to be I understand why there are no HEDT SR parts, AMD has no where to make them that will work. I simply don't understand though why they would do this and not stay on GloFo. I would imagine 32nm would be better if they could get 40% higher clocks out of it. I don't see how it will be a 100w chip just like A10 6800k while having much lower clocks and still be 100w. The GPU?

Regardless I'm somewhat prepared for something bad to happen from AMD in regards to SR. It will be a kind of "SR is a really great architecture but it's on Bulk so it doesn't clock well."

Perhaps it will at least serve as a wake up call to all the "MUH IPC!!!" Intel guys who think IPC is the only important measurement (besides power consumption) of HEDT CPUs.

I just really, really hope I'm wrong about this and that there's something wrong with the information we're getting.

Interesting info about CB.

Any bet that now that CB is not so biased against AMD, several famous sites will stop using it in reviews and comparatives?

I also did read some of those SR rumors, I didn't reproduced them here, because I find them difficult to accept. I already expected Kaveri CPU clocks to be slower than Richland CPU (see my Kaveri article) and still corrected the benchmarks by an extra 5% penalty for safety (it could manage lower clocks and turbo). However, the leaked frequencies I saw don't make many sense, because don't match with the 1050 GFLOP given by AMD labs. Other people is also skeptic about the frequencies leaked. In fact the source said that the frequencies are not decided!!!

8350rocks · Oct 30, 2013

gamerk316 :

The problem children you point at mostly are games running on long in the tooth engines that are nearing the end of their life cycles, but hung on because there are lots of less powerful PCs in the world still.

Plus, it's quite a bit cheaper to develop on UDK3 versus UDK4.

Arkham series started on UDK3, and stayed with it to reuse the source code as much as possible, and maintain continuity. They've basically done as much as they can on UDK3 at this point. I expect that UDK3 will not produce any more AAA titles as UDK4 is becoming more proliferated now, and it runs on multiple cores well.

UDK3 was great when we were talking about Athlon 2 6400+ processors...it's outdated now though. That's why modern GPUs can run it at 120+ FPS with a modern processor...which is entirely overkill considering better than 80% of the monitors out there won't even support better than 60 FPS @ 1080p.

You're pointing at archaic software and saying, "see IPC is most important". Reality is, software still lags behind hardware, and the world turns another revolution. Efficiency of instruction processing is moderately important, but handling multiple threads at once will become more important as the paradigm shifts with this generation of consoles.

juanrga · Oct 30, 2013

8350rocks :

Unsurprisingly that is unrelated to what you wrote before about GPU and unrelated to your nonsense that the OS is run in a secondary chip and 8 jaguar cores are for games exclusively.

8350rocks :

That only covers a fraction of the nonsense that you have said about RISC and about ARM. Both that fraction and what you don't quote was corrected before.

8350rocks :

Sony said 1.6Ghz for the PS4. The 2.0Ghz is only a rumor. Moreover both are far from your previous pretension that it runs a 2.75Ghz. The funny part is that hapidupi was right and you wrong. LOL

8350rocks :

LOL. A pair of posts ago you affirmed that you never said such thing. Now the excuse change to ups I said but that is old

8350rocks :

Seeing as you pretended above to hide what you wrote, do you really pretend I to trust what you claim you were thinking one day? LOL

noob2222 · Oct 30, 2013

@8350 ^^ copy and paste programming is the fastest and cheapest way to release new titles.

Some people can't get past "its new so the programming is new"

AMD CPU speculation... and expert conjecture

Distinguished

Distinguished

Glorious

Glorious

Glorious

Splendid

Honorable

Honorable

Distinguished

Distinguished

Distinguished

Splendid

Splendid

Distinguished

Distinguished

Distinguished

Glorious

Distinguished

Glorious

Glorious

Distinguished

Distinguished

Distinguished

Distinguished

Distinguished

Share this page