AMD CPU speculation... and expert conjecture

Page 515 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Your funny question was "how these apus would be able to store the game data need for processing by igpu ? would amd use the crystals of some broken crystal ball?" and I am asking you to post here which is the DRAM capacity of those APUs. How many game data can "store"? In GB please.



I already gave you the three reasons why Sony and Microsoft selected APUs. Moreover, a dCPU+dGPU traditional architecture doesn't require "swapable gpus". The PS3 used a separate dGPU but was not "swapable".

At the other hand, are you aware that APU for desktops are also "swapable"?
 
Ironically, ALL the developers working on PS4 and XB1 will both tell you the decision to use ESRAM on the XB1 is the achilles heel and is actually a hindrance rather than a help. The ability to use all memory addressable for tasks as a coherent block on PS4 is far more useful and will only hinder XB1 performance moving forward (as we are currently seeing with lower FPS on cross platform games versus what PS4 can provide).

Ding. ESRAM is basically acting as a L4 cache, trying to hide the relatively slow memory access times. This forces devs to pay attention to what is in the ESRAM at all times for performance reasons. GDDR5 is fast enough where this isn't a major concern, though optimized ESRAM use is faster overall, just a pain to manage.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790




Both of you miss that the part that I quoted start with "What is recognized for sure", whereas the part that you quote start with "If the company pursues this opportunity and goes for".

My quote is about what AMD is doing for the future. Your quote is about unfounded (crazy) speculation by the author of the article. He is referring to an former document from AMD that mentions the interconnect capacities of a 16-core bulldozer-like system. this isn't the first time that he speculates about that, in another occasion he predicted as "likely" that the 16-core Steamroller Opteron CPU would be released in 2016, and he even predicted as "highly likely" that AMD would release a posterior 32-core version of it for competing against Xeon Broadwell. I see he now moved his prediction to 2017. It is all in his imagination.

This is the same guy who also took an old leaked diagram in a forum and wrote an entire article about it, pretending the diagram was a 16-core Kaveri-like APU that AMD was preparing for the 28nm node, or the APU of the next Xbox, or some Opteron chip, or fake or...

I have an old doc from AMD that details a bulldozer-like module but with a completely different FPU and memory subsystem than found in Bulldozer/Piledriver. I already knew that it couldn't be Steamroller neither Excavator because the front-end/decoder is shared. AMD will never release that hypothetical module.

However, if he had got his hands on that doc I am convinced that he would be writing an article about how AMD plans to release that module soon in a PC or an Opteron or a console or a tablet or...

AMD: We are transforming into an APU/Soc company
8350rocks (1st try): No. AMD will continue designing dCPUs forever.
8350rocks (2nd try): But what do we find? PLANS FOR A NEW HEDT dCPU!?! Holy cow...what's next chicken little? The sky is falling?
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Thanks by mentioning the obvious, whereas ignoring the main point: AMD plans to use APUs from tablets up to supercomputers

http://www.hpcwire.com/2014/02/13/amd-refreshes-vision-hpc-future/



buttkick.gif

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Yes, that was the point. Without the help of the 102GB/s ESRAM, the APU would be seriously bottlenecked by the slow 68.3GB/s provided by the 2133MHz DDR3 memory on board.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790
Interesting article at SA

http://semiaccurate.com/2014/05/08/intel-goes-vdi-crystall-well-xeons/

Intel will start adding graphics to new family of Xeon CPUs. Intel is just following AMD leadership to use APUs for servers and workstations. HP already showed interest in this kind of products.
 


But you still have to manage it well, which is its own problems. You need to ensure all the data you need fast access to is in the ESRAM when needed, so you now have to pre-load certain objects ahead of time, manage the resources, and so on. You also have a LOT more size constraints, so that will push down texture sizes. So the GDDR5, despite being slower, is the MUCH better solution.
 
Also, found this interesting:

http://www.pcper.com/news/Processors/AMD-Shows-ARM-Based-Opteron-A1100-Server-Processor-And-Reference-Motherboard

The Opteron A1100 features eight ARM Cortex-A57 cores clocked at 2.0 GHz (or higher).

AMD has stated that the upcoming Opteron A1100 processor delivers between two and four times the performance of the existing Opteron X series (which uses four x86 Jaguar cores clocked at 1.9 GHz).

So...double the cores, double the performance? Sounds about right for severs. And assuming "best case benchmarking" is in effect here, I'd wager the new ARM cores falls somewhere between 90% and 115% of AMDs X86 core, IPC wise.
 

truegenius

Distinguished
BANNED

thats simple ( i mean that bold line in english is simple :lang: )
GTA4 need 2GB for 1080p so maybe we need 20GB VRAM for GTA8 at 8k single monitor :whistle:


:chaudar:
thanks for the precious info :mmmfff:



ipc wise, even qualcomm's krait have performance similar or ahead of amd x86 more or less
amd is doing very bad in ipc :pfff:
 
amd to launch new single gpu gfx card to compete with gtx 780ti
http://www.techpowerup.com/200721/amd-to-launch-new-single-gpu-card-this-summer-to-take-on-gtx-780-ti.html
will have stacked high bandwidth memory (hbm). matter of time before it trickles down to apus!!!.....hopefully.



are you pointing to the bandwidth numbers? you have never explained where they come from or how they were measured/calculated or how they work. comparing GB/s numbers are near-meaningless without other factors explained. for example (and example only, don't try pulling an argument out of this), sram can go into TB/s bandwitdh, yet we don't see sram displacing dram.

it's the same as comparing fx 43xx cpus with core i5 and claiming fx43xx is faster because of 4 GHz clockrate and core i5 having 3-3.4 GHz and stating absolutely nothing else. or claiming fx9590 is faster than core i7 4960x because it has 5 GHz clockrate. i was expecting a more detailed explanation.
i didn't quite get what "GDDR5 allows data to be transmitted on both peaks of the signal" means, since ddr ram sends data on both edges of clock signal, not just gddr5.
 

8350rocks

Distinguished


This +1

The ESRAM's inherent management issues actually outweight what little benefit it brought in terms of bandwidth. If bandwidth was the only issue in the hardware, XB1 would do 60 FPS @ 1080p. Though, it does not, ultimately, time will show that they should have done what Sony did and make all the RAM the same type and easily addressable.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


But I am not discussing which approach is simpler to program or which is more elegant. I never said something about that. I am simply mentioning that, without the help of the ESRAM, the Xbox1 APU graphics performance would be cut at about one half back at ~ Kaveri level.

As a general rule of dumb you need about 1/15 byte/flops for gaming. Any bandwidth too below that and the graphics unit cannot work at full potential. That is the reason why when some 'tech' sites speculated about a 768sp version of Kaveri I said that was plain nonsense: dual channel of 2133MHz DDR3 couldnt feed that igpu.

Broken crystal ball rulez! :sarcastic:

Moreover the GDDR5 approach is not slower, but 73% faster: the PS4 peaks at 176GB/s, Xbox1 ESRAM peaks at 102GB/s.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


You found something interesting that is months old and that was discussed in this thread before. The January link that I gave then (but was ignored)

http://www.anandtech.com/show/7724/it-begins-amd-announces-its-first-arm-based-server-soc-64bit8core-opteron-a1100

The performance of the new Opteron is about 3x the performance of that which replaces, which fall in the "between two and four times" range mentioned. You don't need to estimate the performance of each core, it is already in my original link: the new A57 ARM core falls at about 136% of AMDs X86 core, IPC wise :D
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


And unsurprisingly you avoid the same question by third time. I was not asking you how many GB you believe the game will require. I was asking you how many GB those APUs have.

You pretended that those APUs couldn't be used to play your imaginary game, because (you believed) couldn't "store the game data need need for processing by igp", but you cannot even answer how many data those APUs can store. :sarcastic:

In short, you were criticizing what you don't understand.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


XB1 only has 12 CU so 60fps was likely a stretch either way, compared to the PS4 18 CU.

I think others said it best when Sony wanted the best game machine and Microsoft wanted a media hub that can game too. Different priorities.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


Well a 3B company competing with a 130B company. No doubt AMD's progress has slowed.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


As I mentioned before AMD plans to offer HBM for K12 APUs.

It is funny how only a pair of month when I mentioned that AMD was preparing APUs with stacked RAM for 2018, someone here (not you) pretended that stacked ram would be not ready before the year "2030".



First you asked "what" and I replied. Then you asked "why" and I replied again. Now you are asking me the "how". You change the question with each new answer.

Your argument about SRAM is irrelevant to the facts discussed here: GDDR5 has replaced the slow DDR3 in any high-end graphics card; Sony uses GDDR5 instead of slow DDR3 as system memory for the PS4; I already explained "why".



DDR3 only handles an input or output but not both on the same cycle. GDDR handles input and output on the same cycle.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780
Yeah, ARM caught up to AMD IPC. You also don't see 5ghz ARM chips, either. IPC is just half the story. The fact that ARM is competing in IPC with an x86 design that is aimed for low IPC, high frequency speaks volumes of how far behind ARM still is. If we could see ARM beating Intel IPC while clocking as high, I would agree that ARM is winning. But it's not.

AMD is talking about 4ghz for K12, and if we get a 4ghz ARM core, it might compete. But the thermal properties of ARM at that speed will be far different from the 1.5ghz chip in your phone. Meaning that it's going to need a lot more voltage and TDP increases with the square of input voltage. Meaning ARM's advantage can dry up very quickly.

DDR3 is nearly 5 years old now. Of course it's not giving us enough bandwidth. Do you all not remember the transition from EDO -> SDR -> DDR1 -> DDR2 -> DDR3? At the end of life for each one of those technologies people were screaming about how they weren't providing enough bandwidth for something.

DDR4's problem is that it was designed to solve the problem of traditional CPUs not having enough bandwidth and traditional CPUs do have enough bandwidth, we just need more for GPUs.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Wrong. Jaguar and specially puma+ are very competitive against Intel cores despite Intel is much bigger than AMD.

The real reason why Krait is better is because it is based in an ARM architecture. As mentioned by Keller ARM was designed for efficiency and one can extract more performance than from a x86 core with similar power budget.

AMD replaced jaguar servers by ARM servers because x86 cannot compete. AMD was very clear with its "ARM will win" slide. ARM is designing a custom ARM core by this reason. Their K12 core will compete with any core from Apple, Qualcomm, and others.
 

jdwii

Splendid


That is interesting but Amd claims 2-4 times the performance compared to their current 4 core, its important to remember to look at the 5% higher clock rate and then realize it has double the cores. And then its important to look at the fact that jaguar only features 80% the performance per clock compared to piledriver let alone steamroller , then after we figure that out its important to remember that a 4 core Piledriver CPU is competing with a dual core I3 in terms of performance and in some cases the 6 core PD is competing with an I3. Amd is really far behind in IPC to the point where i'm switching over to Intel until Amd comes out with a newer design.
 

jdwii

Splendid


Yeah it does ok
http://www.anandtech.com/show/7314/intel-baytrail-preview-intel-atom-z3770-tested/2
However it is being build on 22nm which means as always(quite some time anyways) Intel has the lead in the fabrication processes the true killer that Amd has to deal with.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Wait, are you sure that ARM caugth? Mr. rocks predicted that arm never could caught x86 because x86 is Complex and ARM is Simplex, he also predicted that trillions of ARM cores would be needed to match a single x86 core.

/sarcasm

Now seriously, older 32bit ARM chips caught both AMD and intel IPC. Modern 64bit ARM has surpassed both AMD and Intel IPC.

The new A57 core has about 36% more IPC than jaguar core. Jaguar core has more IPC than Piledriver. Thus first ARM 64 bit core has surpassed the IPC of a mature Piledriver core.

Apple cyclone core IPC is so high that two of them @ 1.3GHz match quad x86 cores @ 1.6GHz from Intel.

ARM chips have already broken the 3GHz.



Sure that TDP increases with the square of voltage, but this applies to both x86 and ARM chips, not only to the latter.

Once again...

Keller was very complimentary about the ARMv8 ISA in his talk, saying it has more registers and "a proper three-operand instruction set." He noted that ARMv8 doesn't require the same instruction decoding hardware as an x86 processor, leaving more room to concentrate on performance. Keller even outright said that "the way we built ARM is a little different from x86" because it "has a bigger engine."

An 100W ARM chip will provide more performance than a 100W x86 chip.



DDR4 will appear first in servers, whose "traditional CPUs" will benefit from the improved bandwidth. The problem with DDR4 is that its touching the limits of the DDR design and doesn't really aim to solve the CPU memory wall problem.

A solution to the CPU memory wall problem only can be achieved by a revolutionary approach to memory architecture. This is what Intel, ARM, and others members of the HMC consortium are doing. The goal is to provide about 16x more bandwidth than the slow DDR memory. JEDEC attempts that same with the new HBM specification.

Of course GPUs will also benefit from the new ultra-fast memory.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


AMD admits that the standard A57 core has about 36% more IPC than jaguar. Evidently the K12 core will haver higher IPC than A57. How many more?

Jaguar core has about a 90% of the IPC of Steamroller core.

Cyclone core is near the complexity and IPC of Haswell core.

The 4 core Piledriver CPU is behind an i3 in FP workloads, but ahead in integer workloads. Don't forget that the CMT architecture uses a shared FPU per module.



http://www.anandtech.com/show/7974/amd-beema-mullins-architecture-a10-micro-6700t-performance-preview/4

Despite no significant changes to the architecture or manufacturing process, AMD’s 2014 updates to its entry level and low power silicon are substantial. We finally have AMD silicon, built around a non-Bulldozer architecture, that seem to have turbo capabilities comparable to Intel’s. The result is a completely different performance profile. While AMD’s Jaguar cores in Kabini and Temash were easily outperformed by Intel’s Bay Trail, Puma+ pulls ahead. AMD continues to hold a substantial GPU performance advantage as well.
 

jdwii

Splendid
Jaguar core has about a 90% of the IPC of Steamroller core.
Based on every single benchmark i seen jaguar doesn't get anywhere near that level please show me a real world benchmark stating otherwise since its your claim i should not have to go search for that, not claiming your wrong just stating that comment sounds like something Amd would say not something benchmarks actually show.
 

heh, did you really think that by dragging out long you'll be able to dodge the questions? i never changed the questions, the latter ones came on because you kept providing near-baseless explanations and vague answers. so i'll repeat, for your convenience:
how is gddr5 "beyond" the "slow" ddr memory in pcs?
and
by that i mean that you have never provided any technical explanation on that matter. my query is sorta two-parter. part 1 - how is gddr5 beyond ddr3 i.e. what makes gddr5 so much more advanced than ddr3 in terms of technology and performance - assuming that's what you mean by "beyond". part 2 - how is ddr (i guess you mean ddr3) slower than gddr5?

you didn't explain. you just posted some bandwidth numbers instead of explaining where they come from and how they work. i've started to wonder if you really know how gddr5 and ddr3 work and why they're being used the way they're being used in pcs and consoles. because, if you did understand, you'd be able to provide a detailed explanation a long time ago.


so..? what does that result in? how does that relate to "GDDR5 allows data to be transmitted on both peaks of the signal"? i can't find anywhere that mentions "DDR3 only handles an input or output but not both on the same cycle. GDDR handles input and output on the same cycle." i was pretty sure that input and output could never occur simultaneously, gddr5 or not. because that'd mean total zero latency and 100% efficiency. - ideal scenario. forgive me for being skeptical.

at this point i open the questions to Everyone, since juanrga is having trouble explaining.
 
Status
Not open for further replies.