AMD CPU speculation... and expert conjecture

Page 517 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I only see a repetition of stuff that I already said, plus some old myths, and lots of unneeded and redundant details.

Both PS4 and Xb1 have 32 byte wide memory subsystems. PS4 uses 2.750GHz GDDR5 and XB1 uses 2.133GHz DDR3 memory (= 2133MHz). Thus in concise form

XB1: 2.133 GHz x 32 B = 68 GB/s
PS4: 2.750 GHz x 32 B x 2 = 176 GB/s

The extra 2x in the second line is because GDDR5 has double throughput per cycle than DDR3. I already explained why. The above are the bandwidth values that I gave you before. I also gave you the bandwidth of Kaveri PC with 2133MHz DDR3 memory for comparison: 34 GB/s.

Of course, you can repeat my concise computations doing all kind of weird things like taking the CK in GHz (1.066) and then doubling to 2.133 for accounting this is double data rate memory and then obtain hypothetical GT/pin, which are then transformed to Gbit/s, which are then transformed to GB/s.... instead just taking the nominal 2.133GHz and the byte wide and perform a simple multiplication: 2.133 x 32 = 68.

I see that by "explanation" you mean something weird and convoluted. I take note. The next time that you ask me for explanation I will fill 50 lines of useless equations with four or five changes of units and lots of unneeded technical babbling, instead giving you the essentials and the main points.
 


I note you forget to factor in the ESRAM into those calculations, at 32MB @ 102GB/s.

But Sony and MSFT clearly optimized for different things: MSFT went for low latency, low power RAM tech, and Sony went for pure bandwidth at the cost of everything else. Given MSFT's approach to the XB1 [media center], its choices make sense, even if they did flub the thing.

And remember people: In any case, the CPU's on these things are going to be the backbone PC games are ported from 8 years from now. Based on 1.75GHz Jaguar cores. So I think we'll all come around to my initial prognosis that the CPU's on these things are cripples VERY quickly. And it WILL show in ports (which are going to be significantly easier this gen).
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I don't know from where this myth of GDDR5 has much higher latency originated, but there are people in forums speculating about the origin of the myth

http://www.reddit.com/r/PS4/comments/1i7qm6/mark_cerny_on_ps4_gddr5_latency_not_much_of_a/

http://systemwars.com/forums/topic/116470-done-my-research-reports-back-with-verdict-dat-gddr5-ps4/

I have memory timing parameters for both DDR3 and GDDR5 modules and the datasheet lists very close CAS latencies (in ns): 10.3 vs 10.6 respectively. Yes the GDDR5 CAS latency is 3% higher. WOW!

For the sake of comparison

DDR3-1600 10-10-10 has a latency of 12.5 ns
DDR3-1866 11-11-11 has a latency of 11.79 ns

The difference is 6% higher CAS latency and both DDR3 modules have higher CAS latency that GDDR 5 modules mentioned.

Data that I have agrees with PS4 architect claim

http://www.dualshockers.com/2013/07/13/mark-cerny-on-ps4-gddr5-latency-not-much-of-a-factor/

And with analysis made here

http://www.redgamingtech.com/ps4-vs-xbox-one-gddr5-vs-ddr3-latency/

What does it all mean?

It means that the Xbox One doesn’t have a latency advantage really with the DDR3 memory.

 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I note that you forget I am comparing the bandwidth of GDDR5 to the bandwidth of DDR3.

I note that you forget cazalan's post.

I note that you forget that I already mentioned the 102GB/s of the ESRAM in a previous reply to you

http://www.tomshardware.co.uk/forum/352312-28-steamroller-speculation-expert-conjecture/page-262#13279115

:sarcastic:
 

not even close. some of cazalan's explanations are indeed repeated from the discussions when console specs first came out. combined with palladin's post, they make much more credible sense. in reality, none of those came from you. not now, not before.

er.. no, you didn't. others did. i wonder though, why you failed to explain.

you did post some badnwidth numbers without explaining how they came to be. or how they work.

that's just....sad.
detailed explanation isn't "weird", it's believable and verifiable. the vague numbers you posted can be dismissed instantly. the main reason i asked You for explanation was to see if you really knew why you were claiming gddr5 (in ps4) was "beyond" the "slow" ddr3 in pcs. the way you inadequately replied with borderline misinformation, dodged answering, then attacked the detailed explanations and calculations reveals that you didn't know. the real advantage in using gddr5 in ps4 was application and usage specific, not technological. gddr5 is not much more advanced than ddr3.
the less important reason was to improve my understanding about different memory types. i did gain some valuable information from this (not from you unfortunately). thanks anyway.
 

truegenius

Distinguished
BANNED

think about it again
do you know hardware you are commenting on or are you just copy pasting the things your friend and crystal ball told you
your first claim that your apu will annihilate dgpus ( even the extreme ones ) was enough to come to conclude that you don't know some basic facts and ignoring physics

lets suppose we are in 2020 and your hypothetical apus are real then what would you see in tom's system section
you will see a question like this
"what apu do i need to play gta8 @8k @60fps , i will need apu with more ram to create ramdisk as i will record the game too, budget is 200$, should i go for 9900k"
and answer should be like this
"9900k is overkill, you will need a20-9800k because it have 16 cpu cores, 8192 gpu cores and 100GB of on die ram which is suitable for your needs and it consumes only 200w"
but here you are trying to dough my question because you don't have any answer
and asking me to give shape to your imaginary apus
and just by using plain common sense we can say that we will need dgpus and dram modules
you may think that my question was insane, but for an apu having far more better graphics than current hardware far more on die ram which kills extreme dgpus and ram modules

You have shown you have no idea of which is the memory size of that hardware, but this didn't stop you from spreading FUD about the hardware being unable (in your imagination) to store game data for 8k. :lol:
it is your apu so you tell what is the configuration of your apu
how come i have idea of exact ram and gpu cores as you never told anything
you just said it will kill dgpus
and to back this imagination of your you said that it will overcome the bandwidth problem by using ram on apus thus kills ram too
next what ?
will it kill hdd too ?

i am not saying that it won't have powerful gpus or it can't have some on die ram or it won't kill cheap dgpus, it can do all this
i am just saying that it will have these things ( like current apus have ) but you will have dedicated things which will provide you 5x the performance and upgradability with countless possible combinations of specs, everybody agrees on this thing that you can get 4-5x more performance than what an apu of that time offers

you are just creating FUD like bulldozer team did by saying that bulldozer will kill kill i7 and now you are saying that apus will kill gaming dgpus


you can provide simple answers to my questions instead of avoiding them so that if someone asks me to back my claims of existence of such an apu ( instead of just avoiding their questions or saying my friend at tom's whose friend work at amd and have a crystal ball told me so)

so would you provide me some specs or will quote me again to ask the amount of hdd your apus will have ?
 
http://www.xbitlabs.com/articles/cpu/display/amd-a10-7850k.html
xbitlabs reviews kaveri a10 7850k in xbitlabs fashion.

using my newfound mis/understandings on video memory and ddr3, i now dare to speculate about kaveri's gddr5 prospects and possible ways to improve ddr3 performance. *feels confident* please correct me where i'm wrong.
if kaveri had used gddr5 for the igpu, it would have to implement 128bit bus, 1GB capacity at least. although i am not sure if the same pin-outs could be used to send data to both ddr3 and gddr5 using some kind of hybrid mechanism. i assume motherboards would have to lay down 128 more wires for vram from apu socket to the chips. amd could use the gddr5 as side-port memory and use driver to manage vram and system ram. they'd need two memory buses.. or use the gddr5 in on the same package as the south bridge/fch and use the umi bus (i forgot the actual name) to access the vram. basically, the igpu directly communicates to vram over umi instead of going through fch.

imo the best way to improve memory bw without getting gddr5 would be using quad channel mode for system memory and increase overall memory bw. if users fill up all 4 dimm slots, they could run in quad channel mode. it still wouldn't go anywhere near gddr5 bandwidth but it could double the bw up to 40GB/s roughly.
 

8350rocks

Distinguished


NDA, which I am actually already under..let me just say, YOU are spreading FUD about ARM. :)

AMD does not envision an ARM DT platform, at all...in fact, that is to simplify hardware SKUs so they can use multiple parts for multiple markets that are entirely different purposes.

Thanks for playing how is Juan wrong this time! :)
 

8350rocks

Distinguished


102 GB/s that might as well not even be in the system...the ESRAM is such a small amount that you cannot reliably ensure that you are using all of the potential at any given time, and the micromanagement of that small block of RAM has made developers really frustrated about M$'s new console. Just saying...
 


Do recall that MOST datasets are relatively small (on the order of KB); its the shuffling of data into and out of that 32MB that's the killer. Understand that, textures aside, you don't need much more then a few MB worth of cache, provided you can preload what you need into it ahead of time.
 

jdwii

Splendid
Still waiting on that Intel GPU killer wanting some details on that and how it handles games compared to a 295x or dual titans.
Also waiting on that I5 2500K performance from a Amd A10 kaveri to.
 

blackkstar

Honorable
Sep 30, 2012
468
0
10,780


How good is 32MB of video memory when you're targeting 1080p at 60fps?

But it doesn't matter, Xbone eSRAM is going to be stone age trash once we get memory stacking.

I don't know how many times I have to repeat myself about x86 dying. It's not happening. Dell, Gateway, Alienware, and all the other OEMs report their sales every quarter and some analyst who doesn't know anything about computers looks at those numbers and goes "wow x86 is dying, Dell isn't shipping as many PCs!"

AMD's approach is to be flexible and fit into many markets. AMD staying purely x86 makes as much sense as AMD going ARM only. They want that flexibility after the Bulldozer failure.

Look at how inflexible Bulldozer was as an architecture. It was designed for servers that were meant to run tons of weak threads at once (not necessarily from the same program, I'm talking along the lines of spawning 500 apache threads a second for each request or something along those lines).

So AMD designed an architecture around that and then got completely screwed for the entire duration of Bulldozer's lifespan because they were locked into a single ideal and they had no backup plans.

ARM + x86 ambidextrous strategy is about having a backup plan. If ARM doesn't take off like they say, they still have a good x86 core. In fact, they probably have two, cat core and bigger x86 K12 sister core. If x86 dies, they have ARM.

If this was Hector Ruiz's AMD, they would be betting entirely on one or the other. They're not. They're playing it safe for once and they are taking advantage of the fact that they're the only company in the world that can provide a strong GPU, strong x86 CPU, low power x86 CPU, low power ARM cores, and string ARM CPUs. And not only can they provide all of those solutions, but they can mix and match them for specific customers.

Ergo someone wants to build a giant HPC that is HSA enabled and will be running on something that is a great fit for GPGPU? Great, here's your custom solution with some weak arm cores to run your OS and most of the die is GCN cores.

Have some tasks that dont' scale on GPUs? Have this giant traditional CPU!

That's AMD's end goal, and they've stated this over and over and over again. They are creating a set of building blocks for semi-custom products. And meanwhile, they end up going "well, we can give you 8 weak CPU cores with some GCN cores in this thermal envelope. We're the only company that can do this for you. But you'll need to spend some money on R&D for it first, we've never made an 8 core Jaguar before :)"

AMD has looked back at the last few years and saw how GPU elevated their one very weak product and realized that "if we diversify and split up our CPU section, we can still do fine if one of them under performs"

That's what this whole ARM thing is about. They will use whatever products are best for the job in each segment. And the fact that K12 ARM and x86 are considered "sister cores" is making me consider that the two will be very similar in the end.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810



If you want to be concise you don't confuse GHz with GT. One is the frequency of a clock pin, the other is a transfer rate. It's not a hypothetical, it's the term the industry uses for the raw transfer rate, as all interfaces have overhead in the protocols used for transferring data. It's clear you still don't understand the difference so I'll fix your simplified table.

XB1: 1.066 GHz x 32 B x 2 = 68 GB/s
PS4: 2.750 GHz x 32 B x 2 = 176 GB/s

My apologies for showing all that yucky math. ;)
And yes gamer this is ignoring the eSRAM as the question was about DDR3 vs GDDR5.
 

truegenius

Distinguished
BANNED


side port won't be as per hsa as it will have different memory for igpu and cpu
but for current apus it could be a choice as current apus are not using hsa's capability much
and it will be limited too and will be probably non upgradable and will consume board area and will produce heat in board

i am in favour of second option, quad channel could be a choice but it increases overall build cost of setup by some amount but it is upto user to limit rams modules, user can populate other 2-3 channel later thus cost won't be that much north of current builds so win win for user and for amd too as it will help them to increase performance by huge amount even if it needs more power for more gpu cores and memory channel 125w apu won't hurt

and now since we will have ddr4 soon so i would like to see quad channel ddr4 support in new apus ( after caziro ) (upto 102.4GB/s bandwidth with quad channel ddr4-3200mhz or if we use ddr4-4800mhz then it would be 154GB/s, hd7790 on 125w chip anyone ( or even higher if they can creat something as efficient as gtx750ti ))
and to minimize cost or board's, board manufacturers can give board with 1, 2, 3 or 4 channel options in their different price range products so it will meet user's requirements and will be win win for user, amd, and board manufacturer
 
i set aside hsa due to it's immaturity in terms of development and focussed my speculations on improving mainstream, entry level laptop/desktop pc gaming performance only. hsa can help when it's mature and has good market penetration. when hsa becomes fully realized, we'll have embedded memory working it's way down, ddr4 etc.

yes, quad channel will be costly. but look at how a10 7850k is prices initially. it's barely moved from that since launch. for $180~, only 20%~ or such gaming perf. improvement over richland isn't worth paying so much at apus' typical price range. kaveri's spectre igpu is very powerful. you could think as $90 2M/4C sr cpu combined with $90 cape verde gpu and some other bits like true audio and the arm security processor. amd has put together something worthy of mainstream high end, only to have the igpu power restrained. quad channel would have been worth implementing.

the apus are aimed at replacable pcs with very little need for upgradability. it's one of those areas where integration is advantageous. for upgradable, customizable pcs, there's always am3+ class platform. with desktop apus, amd can address both. intel is actually playing at something like this with dt broadwell. they'll snip pcie 3.0 support to pcie 2.0 using power use as an excuse, in reality to push iris pro for mainstream pcs and restrict discreet gfx users to hedt. they're hyping integration but they're real aim is to sell a constricted platform to Almost Everyone (for moniez, nothing to do with tech. progress).

carrizo has less issues to deal with. ddr4 removes the bandwidth limitations of ddr3.. amd might still be latency bound unless they improve the soc. i think TLB misses increase latency in gpus. may be amd can work on improving that in carrizo.

 

noob2222

Distinguished
Nov 19, 2007
2,722
0
20,860
smartcar1.png


Take this highly efficient design and build me a 10 second car.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Bandwidths for each memory and explanation of why one offers double throughput that the other was already in my posts. The only difference is that I did not repeat myths about latency and that I didn't fill in unneeded redundant details that add nothing to the topic. Starting with GHz and then transforming to GT and then transforming to Gbit and then transforming again to GB doesn't add nothing new to a discussion that already starts from GHz and transforms to BG directly.



Nope.



Details are good. I said that "redundant and unneeded details" aren't.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I already explained why "some basic facts" and physics show that GPUs will be killed.



This proves again that you don't have any idea about the hardware that was being discussed. You are shutting random numbers from your...



Are you admitting that your 8K comment was FUD and that you dont have any idea of the hardware that you are commenting on?

A genius doesn't make claims about hardware and latter ask which is the hardware configuration. :sarcastic:



This is all in your imagination. I never said that kill ram or HDD.



The 5x myth was debunked before. The upgradability issue was also replied before...



Except that you confound bulldozer marketing with an academic claim based in the laws of physics. I am not reproducing the claim of particular company who want sell its products.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


GDDR5 was not going to be used for the igpu alone but for the whole Kaveri APU.



GDDR5 and DDR3 are not pin compatible.



What vram? what 128 more wires?



Nope. GDDR5 would be used as system ram.



Nope. Only bus to system ram.



The problem is that quad channel is expensive, occupies lot of space, requires quad tested dimms...

Moreover, quad-channel DDR-2133MHz offers 68GB/s, not 40GB/s. The 68GB/s number was given to you before. I simply note.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Wait a moment! ARM is not even mentioned in the part of my message that you are replying. In fact Steamroller is a x86 architecture. :sarcastic:

Are you trying to say me that your supposed 'NDA' restrictions were not an impediment for you posting FUD about AMD new products? I recall your recent "K12 != ARM", but I didn't forget your former pretension that Seattle was only an "experiment", because "AMD was not really interested in ARM".
 

8350rocks

Distinguished


Note, the space between those 2 sections. I was replying directly to your statement, then elaborating on your lunacy in a separate portion. They have grammar in spain yes?
 


The one thing about ARM that no one seems to look at is how it handles multiple active processes like in a Hyper-V server role. I have not seen anything with any mentionable data and the only experience I have with ARM is in smartphones/tablets and I can tell you that even though they have come a long way, they slow down after time. My Droid Bionic was the first dual core LTE 4G smartphone and by the time I was up for a new one, it was just slow. MY S4 is faster but it is a newer ARM design. But if I have too much open it seems to get sluggish.

As for memory, the memory itself was not the issue. It was more the bus was more than anything. Of course AMD went to a high speed IMC HTT buss back with DDR but Intel went at DDR3 so it looked like a much larger bandwidth jump compared to when AMD moved. Intel went from about 4-6GB/s to 25GB/s (LGA 1366).



I have been saying this for quite a while. In the UMID market, power usage is more than performance. Sure you can have a i7 class performance CPU. But if it sucks the same power, it is pointless.


Are you still hanging on to those JS benchmarks of Apples A7 SoC vs Intels Bay Trail CPU? I have a few things about that for you, for one it is JS baserd benchmarks which can be determined by the browser itself. Considering they used different OSes with different browsers, there is no way to determine if there is a browser bottleneck. I think I explained this before.

Second, It is Atom. Atom, while much more advanced than its original iteration, is nowhere near as complex or as powerful as a full Haswell core is. Compare that Apple A7 SoC to a i5 4670 (both quad cores) at the same clock speed on the same OS using the same benchmarks. I guarantee that Haswell will eat it up like it does current ARM SoCs.

As for the SSR4, DDR4 is going to actually help a lot. It is by default going to push 52GB/s to start, same as current DDR3 quad channel does in LGA2011 but this will be on normal desktops. CPUs do not need this massive bandwidth you seem to think they do, not traditional desktops anyways. APUs will benefit but to a point. The main problem is that since this is system RAM it is always geared to have the lowest possible latency with the best bandwidth. That is why we now have kits of DDR3 1866MHz at CAS9. Due to that, the GPU portion of an APU will never benefit from it in the same way. The best solution? On package or on die stacked RAM, much like Iris Pro has. That is the best way to give an APu the bandwidth it craves while still using what a CPU needs.



Yet Jim Keller, lead tech on AMDs K8, either left or was let go. I wonder why.

While a lead designer is great, there are also the upper management and marketing. Why did Phenom I fall so hard? Well the architecture was flawed for sure but one of the biggest issues I would say was marketing was pushing the CPU where it never would reach. As well the leadership at the time was just horrible.



May be gas efficient (due to small size and low weight) but they are death traps and sports cars are much more efficiently designed. They are designed to take advantage of every benefit to them. Hell the new Corvettes V8 gets just as good gas mileage as my Fusions 3.0L V6 yet puts out more than double the power. Of course I can carry 5 people and a trunk full but still it is impressive.

I wouldn't be caught dead in one of those small cars either. Mainly because I am very tall and even a Ford Focus class car is too small for me.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


I criticized the use of your invented "GT/pin" unit, not your use of "GT".

Saying to a newbie that for computing the bandwidth of 2133MHz DDR3 memory he has to take the 2133 then divide it by two for obtaining 1066 and then multiplying by two again at the end looks more weird than just using the 2133 since the first minute

XB1: 2.133 GHz x 32 B = 68 GB/s

It is not about confounding frequency with transfer rate, because what is being used above is 2.133 GHz not 2.133 GT/s. You are the one that remains confused about both concepts.

A final note. I know that many sites claim that GDDR5 uses a write clock, but being more precise it uses two write clocks, each of them assigned to two bytes.
 

juanrga

Distinguished
BANNED
Mar 19, 2013
5,278
0
17,790


Waiting for future Intel products seems reasonable. But also waiting for benchmarks given time ago doesn't.
 
Status
Not open for further replies.