AMD CPU speculation... and expert conjecture

Page 519 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


But remember, that's a lot of extra die that won't be used in native applications. Its fantastically inefficient.

At the end of the day, the chip will run EITHER X86 code, or ARM code. Not both. So that extra ARM logic goes usused when running X86 tasks, at which point its just eating die that could be used for something else to improve performance (L3 cache, for example).

The ONLY benefit is you can support two markets with one chip, which for a fabless company is attractve. But you have to sacrifice performance from both parts to make it work. AMD is not beating Qualcomm in ARM, and not beating Intel in X86, so this will backfire in spectacular fashion.

One thing I have noted- AMD's high core count cpu's have been aging pretty well. I'm running on a Phenom II X6 which I got years ago and it's still fine now. I think part of this 'the pc market is doomed' idea comes from the fact that pc sales have dropped- but a large part of that is that PC's last so long now.

Pretty much. I mean, an i7 920 is still a viable gaming CPU, and that's what, five years old?
 
the only bad aspect of having both arm and x86 cores is the time, financial, human resources needed to develop, maintain and provide support for each. custom uarch developing is hard and expensive. arm will be with amd, sure. but how long and how integrated will they be for the long term? then there's the li'l part about royalties. arm charges less, so amd has incentive. in reality, arm won't be charging so little much longer.
 

logainofhades

Titan
Moderator


Someone is already working on that one.

http://www.carcraft.com/featuredvehicles/1404_big_block_chevy_smart_car_at_drag_strip/
 

8350rocks

Distinguished


Actually, Cazalan is right, you are wrong. DOUBLE Data Rate means that you actually have 2x1066 MHz on a DDR3-2133MHz RAM stick. The clockspeed it operates at is 1066MHz.

You say you can do math right? Take 2133/2 and tell me the answer.

Juan, what is that spot on your face? Oh, it must be your ignorance showing again. Please stop talking about things you do not know about.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


It did mean that I wrote just before the part that you bolded. it is in english :sarcastic:



It is funny that you still believe that exascale APUs are for gaming consoles :lol:



The answer to the same question has not changed :sarcastic:



In short: you started ignoring everything I have said about those APUs currently under development by Nvidia, AMD, and Intel; ignoring what the engineers say; what gaming experts say, ignoring quotes, links, and technical details (die size, process node, memory, flops, TDPs...); and after ignoring all that you started spreading FUD about the APUS, pretending that those APUs don't have enough RAM to "store game data" :)lol:), but when pressed down you are unable to say how many RAM those APUs have.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


That's my concern for the long run how they'll be able to develop 4 cores at the same time, although you could consider the A57 core a "freebie" as the work is already all done for them. Still 3 cores is greater than 2 and I haven't heard of much growth as far as head count goes.

One just has to hope that the design automation tools they've added to make the pieces of the SoC more "plug-n-play" that they've got some man hours back to use for the core designs.
 

pressure will start to mount after amd release project skybridge and pave way for k12 launch. that's when they'll be left with bd's successor core, k12 and possibly puma's successor and the legacy stuff (bd, mainly). they've taken a smart path by first polishing the uncore and getting them ready for core-swap. it sounds good on paper.. but it'd be tough to execute. oh, then there's the gpus.
if they execute well.. competition will heat up a lot.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


I don't think they're planning to put both on the same part, other than the trusted zone core, for the reason you mentioned (silicon waste). More like they're designing a car to take 2 different engines. And they slap the engine (x86 or ARM) in depending on the buyer. As they'll use the same process AMD can just tweak their wafer orders per month depending on which one gets more traction. They moved more production to GF so that should help reduce costs and get back some control over inventory levels.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I am not criticizing the final results, which agree with values that I gave first. I was criticizing his convoluted and misleading derivation of the same results.



The two forum links with people trying to understand the origin of the myth was given only for entarteniment. The important links were the other two, debunking the old myth about latencies.

GDDR5 and DDR3 having about the same latencies doesn't change a bit what I said about DDR3 being slow, because GDDR5 still provide double throughput than DDR3. It is not mathematics, GDDR5 has more throughput. This is why high-end GPUs use GDDR5 instead of DDR3. This is why the PS4 uses GDDR5 instead DDR3. This is why AMD initially planned to offer a GDDR5 version of Kaveri...



You can reduce vertical size with a PCIe card riser.

The goal of integration is not only to save space and reduce costs, but also to improve performance and simplify programming. E.g. AMD plan to integrate CPU and GPU on same die looks like

evolving2.jpg


The last step is not possible with two memory buffers one for CPU and other for GPU. Here you have the words of two leading game developers about unified memory:

Digital Foundry: In his QuakeCon keynote, John Carmack talked about the potential of giving the GPU access to the complete address space of the consoles' memory and suggested that something needed to be done on PC. What's your view on this?

Cevat Yerli: In an ideal world, the GPU and CPU would be accessing and sharing the complete memory address space to really close the gap between PC and console gaming. Besides closing the gap, it would simplify many developments in graphics, in compute and also generally speeding up many operations which are data heavy. However, on PC we are restricted to DX or OpenGL standards for many of the aforementioned operations. As long as MS or the OpenGL ARB don't update these standards, developers will be limited to what they have. We see that OpenGL is gaining traction, but DX11 has received very little update and traction on PC. I think DX11's use has been boosted only by the appearance of next-gen consoles. So we all want, essentially, is a unified architecture for PC and console gaming, coupled with unified development tools, or, even better, access below abstraction layers in a unified way.

I can say that their dream is coming to reality soon.



Current console games are already using hsa-like techniques to compute using the GPU. Sony is contributing member of the HSA foundation. Next I give you a link to PS4 APU and what its huma capabilities mean for games

http://www.coconutdaily.com/content/why-ps4-will-be-performance-beast



What about I am living in Europe? Since past year prices for 2133MHz modules have been on pair or even cheaper than for 1600MHz modules.



Yes it is my choice and it agrees with the usual choice by the industry including memory makers.

Thanks by respecting it, I only hope that caza and others follow your example.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


The funny thing about all this ambidextrousness, it was part of Dirk Meyers strategy and he was dropped like a hot potato.

They can get some wins if the focus is more on the uncore.

AMD's SeaMicro SM15000(TM) Server Sets Industry Benchmark Record for Hyperscale OpenStack Clouds
http://finance.yahoo.com/news/amds-seamicro-sm15000-tm-server-143000653.html

(couldn't determine which CPU was used to get those numbers though)
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


SkyBridge will use either ARM cores or x86 cores. Not both on the same die with one half of them unused. People can replace their x86 SoC by an ARM SoC and viceverse

amd-project-skybridge-arm-x86-640x360.jpg
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I am convinced that caza will be able to get the fine irony in the underlined part of your post. :sarcastic:
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


What four cores they have to develop? A57 is ready. Puma+ is ready and only needs to be tweaked a bit to adapt it to ARMv8. The only two cores that remain are K12 and its x86 sister. But the x86 sister is being developed from ideas used for the K12 core, thus it is not the same as designing an entire core from scratch.

 

truegenius

Distinguished
BANNED


dude
did you even read
didn't i told you to repost the specs as i missed your posts where you posed links and specs

and to mods
can i report post of this troll as he never posted any links to reply me in previous 3 pages and saying that he posted the same
he never posted tech specs even after asking him many times and saying that he already did
isn't he abusing the spirit of a proper discussion and every other poster is getting frustrated of his trolls
i would like to report his posts for trolls, can i do that ?
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


As I mentioned time ago AMD is transforming itself into a custom SoC company. The new SoC not only will have the option to change the core, ARM <--> x86, but will also give the option to change caches (including L3), memory (HBM, DDR4...), graphics, and so on.

Some relevant parts from Charlie analysis:

This new ARM core called K12 is due in 2016 after two generations of SoCs/CPUs that use vanilla A57s.

Don’t think of this as two chips that share a common pinout, think of it as an entire CPU or SoC with everything in common but the cores. Those cores are far less important than the rest of the chip, they are just another swappable component. I would say we exclusively told you about this over three years ago, but well, we did just that. I think the naysayers are laughing at this one because they still don’t understand what is going on much less it’s significance.

So the transition goes distinct ARM A57 CPU and x86 platforms in 2014, common socket and more importantly common uncore for the CPU/SoC in 2015, and then custom AMD 64-bit ARM core in 2016. That makes sense and the stepwise transition is a good one to minimize the potential for showstopping bugs and delays.

Back to the greater question of why anyone would bother with an ARM core vs an x86 core, that is a bit thornier. The most important issue on this front is software, x86 has it, the others don’t. x86 is the gold standard for datacenter ISAs with the entire infrastructure built around it. In the past few months however, ARM is catching up fast with the entire Ubuntu and Red Hat package base now ported over and running on ARM with only small fractions of a percentage of packages not compatible. In short the software is there now.

What this means is that ARM and x86 are now interchangeable on almost every level. It doesn’t matter what you deploy, the software should just work. This is of course true for everything but Windows, but Microsoft is no longer relevant in the datacenter, Linux runs the overwhelming majority of sockets in this space, Microsoft itself and it’s Azure product being the only notable exception. ARM should be seen as interchangable with x86.

Going back to AMD, their ‘ambidexterous’ strategy is simple, offer what buyers want with the same tools, management stack, boards, and everything else surrounding it. If you want big x86 cores, swap them in for a few smaller ARM cores. If you want lots of small independent cores, swap those in too. The rest of the stack doesn’t change, one set of boards, one set of software, and one management console, this is a killer app for the big datacenter buyers.

In doing so AMD relegates its traditional core product, that would be a CPU core, to another swappable component from a menu of SoC offerings. This is the future of compute and everyone will be doing it before long, AMD is just the first. Just like SemiAccurate said in 2011.

The bold parts coincide with what I have been saying here for months. I'm sorry that my broken crystal ball worked once again. :sarcastic:
 

looks like you're further along the way to refute your own claim than i previously thought... right above, you're explicitly speficiying gddr5's advantage over ddr3 - for gpus i.e. application-specific, not overall. you were comparing it to "slow" ddr3 in pcs - which acts as an extension of the cpu and an intermediatery between the cpu and gpu while gddr5 on gfx cards act as gpu's local memory. gddr5's throughtput only helps gpus because gpus are latency-tolerant. if the processor, cpu or gpu, doesn't take advantage of the bus width, then gddr5's throughput advantage becomes moot. and the latency claim wasn't debunked, if you discount the forum links, others don't directly argue gddr5's latency being similar to ddr3.

true, then you end up with a too wide box. not really small form factor. gigabyte did something atypical by using a riser slot and an mxm card in their amd brix gaming pc, but the box got too hot and throttled the cpu clockrates on load to keep temperatures under control.

amd could put dual imc-dual phy. i don't know if that's possible to design though. the gddr5 memory controller would be connected to the igpu.

yea i didn't even bother reading that link, my speculation is about entry level pc gaming using kaveri. hsa is practically non-existent in pcs at present. that's why the imaginary kaveri takes this sort of crude approach of using seperate types of ram for seperate processors. ideally, what hsa set out to do would be the right way.


 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I gave die size, process node, memory size, gflops, TDPs, number of latency cores, bandwidths, interconnects... and I did it five or six times in this same thread.

You ignore what I say about hardware X, then you make silly claims "hardware X cannot Y", and latter you ask me to give you again the specs of hardware X. LOL!

Second, when contacting them, don't forget to mention that you started this charade with a direct personal attack against me

http://www.tomshardware.co.uk/forum/352312-28-steamroller-speculation-expert-conjecture/page-261#13260783
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


It is more as repeating the same. I mentioned that GDDR5 also provides advantages to APUs such as that used in the PS4. And I mentioned AMD original plan to offer GDDR5 support on Kaveri.

Throughput is also important for CPUs, This is why CPUs use caches with 10x the throughput of the "slow" DDR3 memory. This is why servers will be migrating to DDR4 when ready. The new HCM architecture has been explicitly designed to eliminate the memory wall from CPUs. Of course it also benefits GPUs.

I already gave you latencies for GDDR5 and DDR3. I guess you missed those. I also guess you didn't read the links that I gave, because they mentioned explicitly that GDDR5 has about the same latency than DDR3.

I repeat the link and repeat the quote:

But what about the actual latency of DDR3 vs GDDR5?

I’ll save you the trouble and say that it’s between 10 – 12 for both DDR3 and GDDR5 varying heavily on the clock speed of either the DDR3 or GDDR5 AND in the timings they’ve chosen.

What does it all mean?

It means that the Xbox One doesn’t have a latency advantage really with the DDR3 memory.

http://www.redgamingtech.com/ps4-vs-xbox-one-gddr5-vs-ddr3-latency/

I will not repeat again.



It doesn't needed to be a wide box. Brix heat problems are not from using a rise card, but from putting too many power in a too tiny 10' size box.


As I mentioned before your speculation goes against AMD goals of integration (step 3 in the figure given before). I note that you also ignored the quotes from two leading game developers (Carmack and Cryteck dev.) explaining the advantages of using a single memory poll for gaming.



It seems the rule in this thread. ;)
 

i see that you don't need any further technology-related input, you have refuted your own claim. gddr5 support isn't implemented in kaveri. and ps4 using gddr5 is application-specific. by.your.own.account.
edit: you mentioned this as what makes gddr5 "beyond" the "slow" ddr3 in pcs:

i am still waiting for clarifications, after multiple inquiries. i hope others verify what you state.
i am adding this again so that you don't miss it like the previous times. or may be you lack reading comprehension.

:lol: are you really this contradicting? i used sram as a reference earlier and

right back atcha. why? the ps4 soc's cpu clusters already have large and improved L2 cache. almost all cpus right now have L2 cache, some even have L3 cache or LLC. this is why i specifically didn't use cache/sram in the argument.

you mean the napkin calculations in the forum threads, which, by your own account, you posted for entertainment. if i go along with your statement, i gotta ignore those.

you better not. xbone(R) was not even part of this.

yeah... it increases width and creates yet another hotspot inside the case. the main problem with the dgfx is the cooling. most gfx cards, even low profile, single/dual slotters use open air cooler that is dependent on the case to exhaust hot air. among current cards, very few... if any, use blower type. gigabyte brix gaming didn't put too much power insde a tiny box.. i'd go as far as to say they put too little, especially in the cpu dept. their cooling was inadequate despite the concept. i'd use my imaginary kaveri in a brix box if i'd use an mxm slot + for the mini projector. otherwise it'd be inside a nuc-class case.


i did ignore them. because the hsa machinations inside kaveri are incomplete, software (games) is non-existent. i use the integration to eliminate the need for dgfx in a small box and open up the igpu bandwidth using gddr5 - the whole thing, in principle act like a regular pc with cpu, system ram, dgfx with it's own vram except the cpu is actually an apu with igpu, pcie, imc integrated and the vram is soldered on motherboard. oh, and it'd use so-dimms instead of standard dimms. i wanted to clarify that even though i've mentioned nuc and brix already.

you said it! ;)
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810

The accurate terminology for the data rate or transfer rate is T/s or bps but that is your choice.

SKHynix here showing the true clock frequency and data rate for some DDR3 and GDDR5 parts.

Graphics_DDR3_H5TQ4G63MFR.jpg


H5GC(Q)4H24MFR_ordering_information_resize.jpg


Samsung the #1 DRAM provider also uses Mbps or MT/s, and Micron the #2 DRAM provider also uses MT/s.

img_products0301_ddr4_1.png


In general you're better off using MT/s as it is much more commonly used between both parallel (DDR3/GDDR5/HBM/WideIO2) and serial interfaces (HMC/PCIe/SATA/HT/QPI).
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Nope.



You used an argument about cost, which was irrelevant to discussing throughput.



Are not forum threads. I gave you again the link in the last post. Not only you ignored it once again, but you delete it.



It was a link about latencies of DDR3 and GDDR5.



What the leading game developers said applies beyond HSA. Moreover, games that use the unified memory poll already exist. In fact they discussed a bit about them.


Beautiful example of how you take a phrase original from you "yea i didn't even bother reading that link", cut it to "yea i didn't even bother reading", and then attribute it to me. Nice manipulation!
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810


I expect them to continue making improvements to all the platforms each year or they will stagnate. None of AMD's competitors are going to slow down on the ARM side and Intel certainly isn't slowing down on the x86 side.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


The choice was about using frequencies instead of data transfer rate.



And in the table that I gave Hynix use effective frequencies. Again: one can use either.

Besides explaining this to you I also gave you a link to GSkill FAQ explaining the same that I am saying you. I repeat the link and the relevant part in case it was missed

http://www.gskill.com/en/faq/Memory

Q:
What is the difference between “DDR3-1600” and “PC3-12800”?

A:
There are two naming conventions for DDR memory, so there are two names for the same thing. When starting with “DDR3-“, it will list the memory frequency. When starting with “PC3-“, it will list the memory bandwidth.

DDR3-1333 = PC3-10666 or PC3-10600
DDR3-1600 = PC3-12800
DDR3-1866 = PC3-14900
DDR3-2000 = PC3-16000
DDR3-2133 = PC3-17000
DDR3-2400 = PC3-19200

I already said you that I will continue referring to memory modules by their effective frequencies. I will continue calling to DDR-1600 memory a 1600MHz memory and to DDR-2133 memory a 2133MHz memory, like GSkill and Hynix do and as it is usual in the field

http://extremespec.net/crucial-shown-memory-ddr4-3000-ces-2014/

You can continue posting about this, but I will ignore.
 
Status
Not open for further replies.