AMD Piledriver rumours ... and expert conjecture

Page 48 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
@triny: in the '113% improvement' link you posted, the research was funded (jointly) by amd.
http://news.ncsu.edu/releases/wmszhougpucpu/
while the research is just research, amd will use this for marketing hype.
in the experiment, (simulated) llano needed l3 cache. trinity won't have l3 cache, so the 'improvement' might not happen until the apus use l3 cache.
covering all the perspectives would have been better instead of simply putting in '113% improvement'.
@chad: my the biggest issue with the '7760' igp is amd's misleading naming. afaik 7750 would be a gcn gpu and will have at least 1 gb gddr5 vram. meanwhile, this '7760' seems like older vliw4 gpu with some tweaks. having a higher model number does not make it a better performing gpu, and it's well known that the igpu would be much less powerful (amd will make sure of that) than the discreet gpu. imo amd didn't pull this with crap llano - 6550 and others were properly numbered (compare 6550 with 6750).
 
:lol: I don't know why this should be so hard. :pfff:

How about this then, what is this "7760 IGP" going to perform like?

How much better will it be than X or Y or Z.

I realize your an Intel fanboi and therefor don't do much research.

Considering the 7690M is just a die shrunk 6750M and the 7690M-XT is a die shrunk 6770M a 7760M would be below the 7690M. The biggest performance impact would be the memory subsystem as the APU is using DDR3 UMA vs the discrete using dedicated GDDR5.

Also shouldn't be hard to see that the current 6620G absolutely trash's the HD3000 in ~everything~, even considering the CPU handicap going from a cut down Stars core vs a SB. The 6620G (and its bigger APU brother) does gaming quite well, as I demonstrated in those posted benchmarks. Thus any improvement in the GPU department is a solid win, especially if it CF's with the discrete mobile GPU.

This is all based on the mobile sector which is where the vast majority of APU's make sense. The low profile mini-ITX HTPC box's have no need for a discrete GPU right ~now~, much less if the next iteration has a more powerful fusion IGP.

If your going to try to heckle people about the performance level of the 7760, then you, me and everyone else here knows that nobody has exact numbers. AMD hasn't released them nor have they released kits for Tom & friends to review. When they do then we'll know. Trying to use the non-existence of exact numbers as some sort of debate point to put Trinity in a negative light is both dishonest and underhanded.

We know your agenda is to push pro-Intel, anti-AMD feelings. Trinity's PD cores might suck, or they might not, we don't know. BD sucked, we all know that. The one thing AMD is doing right is graphics, mostly thanks to their purchase of ATI. So if you want to slink mud or heckle, then sling mud and heckle the crappy BD based CPUs.
 
I realize your an Intel fanboi and therefor don't do much research.

Considering the 7690M is just a die shrunk 6750M and the 7690M-XT is a die shrunk 6770M a 7760M would be below the 7690M. The biggest performance impact would be the memory subsystem as the APU is using DDR3 UMA vs the discrete using dedicated GDDR5.

Also shouldn't be hard to see that the current 6620G absolutely trash's the HD3000 in ~everything~, even considering the CPU handicap going from a cut down Stars core vs a SB. The 6620G (and its bigger APU brother) does gaming quite well, as I demonstrated in those posted benchmarks. Thus any improvement in the GPU department is a solid win, especially if it CF's with the discrete mobile GPU.

This is all based on the mobile sector which is where the vast majority of APU's make sense. The low profile mini-ITX HTPC box's have no need for a discrete GPU right ~now~, much less if the next iteration has a more powerful fusion IGP.

If your going to try to heckle people about the performance level of the 7760, then you, me and everyone else here knows that nobody has exact numbers. AMD hasn't released them nor have they released kits for Tom & friends to review. When they do then we'll know. Trying to use the non-existence of exact numbers as some sort of debate point to put Trinity in a negative light is both dishonest and underhanded.

We know your agenda is to push pro-Intel, anti-AMD feelings. Trinity's PD cores might suck, or they might not, we don't know. BD sucked, we all know that. The one thing AMD is doing right is graphics, mostly thanks to their purchase of ATI. So if you want to slink mud or heckle, then sling mud and heckle the crappy BD based CPUs.

That's a perfect explanation
except I have a 2500k in my box but I don't let that blindside me.research into new and better ways of doing things almost
never comes from those on top ,it almost always comes from the under dog.
I Pity the fanboys of any company they're blind as a bat ,dumb as a doorknob and are least likely to see change
coming.
 
@triny: in the '113% improvement' link you posted, the research was funded (jointly) by amd.
http://news.ncsu.edu/releases/wmszhougpucpu/
while the research is just research, amd will use this for marketing hype.
in the experiment, (simulated) llano needed l3 cache. trinity won't have l3 cache, so the 'improvement' might not happen until the apus use l3 cache.
covering all the perspectives would have been better instead of simply putting in '113% improvement'.
@chad: my the biggest issue with the '7760' igp is amd's misleading naming. afaik 7750 would be a gcn gpu and will have at least 1 gb gddr5 vram. meanwhile, this '7760' seems like older vliw4 gpu with some tweaks. having a higher model number does not make it a better performing gpu, and it's well known that the igpu would be much less powerful (amd will make sure of that) than the discreet gpu. imo amd didn't pull this with crap llano - 6550 and others were properly numbered (compare 6550 with 6750).
Im guessing the 7760G will compare to the 7750m which in turn is a rebage of the 6750m so it should reflect that in performance.
 
:lol: I don't know why this should be so hard. :pfff:

How about this then, what is this "7760 IGP" going to perform like?

How much better will it be than X or Y or Z.



Well if Amd is naming based on performance Then I would say it should perform like a 7750 discrete video card on a Laptop(not desktop), I'm more then willing to say the desktop version of trinity will probably perform like a 6670 does today.
 
i was comparing desktop igpus... i don't have enough info on the discreet amd mobile gpus. reason: i haven't seen them much on (newer) laptops. afaik amd's graphics switching technology is inferior to nvidia's.
i also haven't seen any llano laptops with discreet mobile gpus. i thought that most llano laptops carried the apu only - that's what i saw whenever i searched for one.
i'll have to educate myself more on mobile gpus.... :)
 
@triny: in the '113% improvement' link you posted, the research was funded (jointly) by amd.
http://news.ncsu.edu/releases/wmszhougpucpu/
while the research is just research, amd will use this for marketing hype.
in the experiment, (simulated) llano needed l3 cache. trinity won't have l3 cache, so the 'improvement' might not happen until the apus use l3 cache.
covering all the perspectives would have been better instead of simply putting in '113% improvement'.
@chad: my the biggest issue with the '7760' igp is amd's misleading naming. afaik 7750 would be a gcn gpu and will have at least 1 gb gddr5 vram. meanwhile, this '7760' seems like older vliw4 gpu with some tweaks. having a higher model number does not make it a better performing gpu, and it's well known that the igpu would be much less powerful (amd will make sure of that) than the discreet gpu. imo amd didn't pull this with crap llano - 6550 and others were properly numbered (compare 6550 with 6750).

Had you taken the time to understand the Idea you would know that both companies will benefit,so marketing hype
can be used by both companies. The 7760 is a mix of old and new designs.

 
i was comparing desktop igpus... i don't have enough info on the discreet amd mobile gpus. reason: i haven't seen them much on (newer) laptops. afaik amd's graphics switching technology is inferior to nvidia's.
i also haven't seen any llano laptops with discreet mobile gpus. i thought that most llano laptops carried the apu only - that's what i saw whenever i searched for one.
i'll have to educate myself more on mobile gpus.... :)

Considering a I have one right behind my desk, I would say your wrong about them not being common on "newer" laptops. HP, Leveno, Sony and Samsung all make APU based laptops. And those are just manufacturers I found today while searching for one with a 3550MX.

My current favorite is the HP DV6 series (DV6zqe). Its a 15 inch screen laptop with a ton of options and scales from the sub $600 USD segment to the $1000 USD segment. Earlier I posted benchmarks for the 6620G which comes standard with all the A8 35x series APU's. These are offered either as is, or coupled with the 6750M / 6770M (7690 / 7690-XT) discrete GPU's.

Here is a repost form someone asking for a $1000 ~ $1400 USD laptop for applications that can also do gaming.

Easy, HP DV6zqe

$1069.99 USD
-$75.00 Promo code
----------
994.99 Cost.

AMD 3530MX 2.0 ~ 2.7 GHZ CPU
8GB DDR3-1333 dual channel
AMD 7690M 1GB (die shrunk 6750M), This will do CF with the APU's own 6620G for a performance boost
750GB 7200RPM HDD
1920x1080 LED Screen, 15 Inch.
BluRay Drive
9-cell Li battery

http://www.shopping.hp.com/webapp/shopping/load_configuration.do?destination=review&config_id=7011597#

Color dark umber edit
Operating system Genuine Windows 7 Home Premium 64-bit edit
Processor AMD Quad-Core A8-3550MX Accelerated Processor (2.7GHz/2.0GHz, 4MB L2 Cache) edit
Graphics card 1GB AMD Radeon(TM) HD 7690M GDDR5 Discrete Graphics(TM) [HDMI, VGA] edit
Memory 8GB DDR3 System Memory (2 Dimm) edit
Hard drive 750GB 7200 rpm Hard Drive with HP ProtectSmart Hard Drive Protection edit
Office software Microsoft(R) Office Starter: reduced-functionality Word/Excel(R) only, No PowerPoint(R)/Outlook(R) edit
Security software No additional security software edit
Primary battery 9 Cell Lithium Ion Battery edit
Display 15.6" Full HD HP Anti-glare LED (1920 x 1080) edit
Primary optical drive Blu-ray player & SuperMulti DVD burner edit
Personalization HP TrueVision HD Webcam with Integrated Digital Microphone and HP SimplePass Fingerprint Reader edit
Networking 802.11b/g/n WLAN edit
Keyboard Standard Keyboard with numeric keypad edit

Changes from default were going to the 3550MX CPU, 8GB memory, discrete GPU add-on, faster HDD, 9 cell battery and 1920x1080 screen. Removing some of these will net you a cheaper system, the screen upgrade alone is $150. They also offer a 160GB SSD for +$130 USD. I recommend the battery option but if you feel it's too much you can go with the 30$ cheaper default 6-cell battery. The model scales from $600 to $1000 depending on options and accessories.

There you go, sub $1000 15 inch notebook with 1920x1080 screen, 7690M dGPU, 3550mx (2.0 ~ 2.7 GHZ) and 8 GB DDR3-1333 dual channel memory.

It's a little unscaled as you can drop the screen resolution down to 1366x768 and ditch the dGPU for a reduction of ~220 USD. Can also scale down the CPU and HDD to get even lower.

Their extremely popular lately, been selling quite well. OEM's love the APUs.
 
@triny: i do understand the idea. but the reality is different (amd's decision to not use l3 cache with llano and trinity).
if you have some info on 7760 specifications and/or performance, please share. thanks in advance.
edit:
@palladin9479: thanks for the info. i didn't know mobile llanos could cfx with 7000 series gpus. i'll look into that more later.
 
well AMD still sells the radeon mobile gpus for intel chipsets and also offers them as dual graphics for the APU laptops. The 7000 line of amd mobile gpus are mostly 6000 gpus with new names and some of the have slight clock differences.

The 6620G was placed there because it performance was just below that of the 6650m which was still used in amd laptops before llano launched. The 7760 sounds like they up the performance above of the 6770m which was about 30% faster than the 6620G. Only time will tell.
 
i do understand the idea. but the reality is different (amd's decision to not use l3 cache with llano and trinity).
if you have some info on 7760 specifications and/or performance, please share. thanks in advance.


Earlier I explained about L3, me and Mu had a back and forth discussion on it.

They removed it because of all the components on a die, the L3 takes up a disproportionate amount of space for the little performance it gives (desktop / mobile applications). If your going to hack something off to put a GPU on the most logical thing to remove is the L3. Their only other options were to remove cores (worse impact then removing L3), or make the die bigger (more expensive). The first option is self defeating and the second option is counter productive to making a low price integrated chip.

In all your engineering experience, what would you of done?
 
i do understand the idea. but the reality is different (amd's decision to not use l3 cache with llano and trinity).
if you have some info on 7760 specifications and/or performance, please share. thanks in advance.


I can't say for sure Trinity won't have l3 since they pushed back it's release to June ,you have more info than I.
 
Earlier I explained about L3, me and Mu had a back and forth discussion on it.

They removed it because of all the components on a die, the L3 takes up a disproportionate amount of space for the little performance it gives (desktop / mobile applications). If your going to hack something off to put a GPU on the most logical thing to remove is the L3. Their only other options were to remove cores (worse impact then removing L3), or make the die bigger (more expensive). The first option is self defeating and the second option is counter productive to making a low price integrated chip.

In all your engineering experience, what would you of done?

I would not discount L3 Unless they plan on making Veshera on a whole new die and making it an apu as well
My reasoning is that the gains from that up to 113% would be minuscule .
 
they could make a dedicated gddr5 memory pool for the apus integrated to the motherboard, shouldn't be too hard. Just something I think could be nice.

Well it would need to be connected into the CPU directly for the GPU to use it.

Better idea is next big socket revision, add a back side bus for 128-bit GDDR5 that can be used for dedicated GPU memory. On a board you could introduce a small socket for a small 512MB ~ 2GB GDDR5 memory add on. If no add on is detected it use's UMA, if add on is detected then it use's the much faster add on.

As the GPUs power starts to ramp up there will come a time when the system memory interface is severely handicapping the GPU. It's already acting as a bottleneck right now. Laptops will just have the memory soldiered to the board like they do discrete GPUs.

Basically they would have to add a bunch more pins to support a separate memory channel for the GDDR5. Wouldn't really be hard to implement, hardest part would be to get the OEM's to sign off on putting it on their boards and mobile devices. As they move forward with this style of integration this is something their going to have to look at doing, main memory is simply too slow for GPU's to use without being bottle necked.
 
Earlier I explained about L3, me and Mu had a back and forth discussion on it.

They removed it because of all the components on a die, the L3 takes up a disproportionate amount of space for the little performance it gives (desktop / mobile applications). If your going to hack something off to put a GPU on the most logical thing to remove is the L3. Their only other options were to remove cores (worse impact then removing L3), or make the die bigger (more expensive). The first option is self defeating and the second option is counter productive to making a low price integrated chip.

In all your engineering experience, what would you of done?
er..i read your (and mu_engineer's) posts on l3 cache. i read this thread frequently.
i think that the ncsu experiment's results were posted in a way that might be deemed as misleading - that's what i was talking about.
as for the apu design, i'd go with the most cost effective way considering they're entry level products. if cutting of l3 would be more cost effective (it paid off in mobile sector) then that'd be the way to go.
no i don't have cpu/apu design experience besides studying some vlsi/16bit instructionsomethingitotallyblockedoutbychoice. my opinion was based on the real world effects (result of no l3 cache). well.. they were mostly based on gaming performance and some casual use benchmarks(some apps i use everyday).
I can't say for sure Trinity won't have l3 since they pushed back it's release to June ,you have more info than I.
what?!? i was totally under a very different impression...
 
I would not discount L3 Unless they plan on making Veshera on a whole new die and making it an apu as well
My reasoning is that the gains from that up to 113% would be minuscule .


Well you need space ~somewhere~ to put the GPU. If you look at the layouts for modern day CPU's you see that 30~50% of the die space will be used for L3 cache, that is an insane amount of room that rarely gets fully utilizes on desktop / mobile applications. L3's job is to act as a fall back to a L2 cache miss, it's really only effective when your predictor is sh!tty or when your application's memory space is so large that the predictor can't fit the hits inside L2. You can see this design in action when comparing the amounts of L2 cache on various uArchs. Llano / Sabine (Stars APUs) have 1MB per core of L2 Cache with the Phenom II X4's having 512KB of L2 and 6MB of shared L3 (1.5MB per core). Intel SB's have 256KB of L2 Cache but a very low latency L3 cache of 6~8MB (3MB of mobile platform). BD's have 2MB of shared L2 cache per module (1MB per ~AMD core~) and 4~8MB of shared L3 cache.

What you see is that you can reduce the amount of L2 cache per core to reduce latency but only if you have a good amount of L3. If you take away the L3 then you better have decent L2 cache performance or you'll stall our your CPU. The Phenom II's (Stars K10 core) had pretty good L2 cache performance, in both latency and hit rate. To further buffer the Llano / Sabines from cache issues they increased their cache from the standard 512KB to 1MB, this has a greater impact then introducing 2~3MB of L3.

Now BD has serious issues with L2 / L3 cache latency and from what I can tell branch prediction. If they haven't fixed that prior to Trinity being released, then the PD based Trinity APU without L3 is going to have serious performance issues CPU wise. If they have fixed it then we'll have a solidly performing product to look forward to.
 
Well you need space ~somewhere~ to put the GPU. If you look at the layouts for modern day CPU's you see that 30~50% of the die space will be used for L3 cache, that is an insane amount of room that rarely gets fully utilizes on desktop / mobile applications. L3's job is to act as a fall back to a L2 cache miss, it's really only effective when your predictor is sh!tty or when your application's memory space is so large that the predictor can't fit the hits inside L2. You can see this design in action when comparing the amounts of L2 cache on various uArchs. Llano / Sabine (Stars APUs) have 1MB per core of L2 Cache with the Phenom II X4's having 512KB of L2 and 6MB of shared L3 (1.5MB per core). Intel SB's have 256KB of L2 Cache but a very low latency L3 cache of 6~8MB (3MB of mobile platform). BD's have 2MB of shared L2 cache per module (1MB per ~AMD core~) and 4~8MB of shared L3 cache.

What you see is that you can reduce the amount of L2 cache per core to reduce latency but only if you have a good amount of L3. If you take away the L3 then you better have decent L2 cache performance or you'll stall our your CPU. The Phenom II's (Stars K10 core) had pretty good L2 cache performance, in both latency and hit rate. To further buffer the Llano / Sabines from cache issues they increased their cache from the standard 512KB to 1MB, this has a greater impact then introducing 2~3MB of L3.

Now BD has serious issues with L2 / L3 cache latency and from what I can tell branch prediction. If they haven't fixed that prior to Trinity being released, then the PD based Trinity APU without L3 is going to have serious performance issues CPU wise. If they have fixed it then we'll have a solidly performing product to look forward to.

Wow I was not aware it used so much space though I knew that L3 was kind of redundant
Thanks for taking the time to explain it

 
Thats been amd's plan all along with separating the FPU. Doing it at the hardware level takes more time and baby steps to ensure its maturing properly.

That's when AMD plans to release a pair of APUs code named Kaberi and Kavini that will sport a CPU-GPU combo that shares a unified memory cache. Those chips will also feature unified address space for the CPU and GPU components, the latter of which will use pageable system memory via CPU pointers.

In 2014, AMD intends to take HSA from the architectural integration stage to the system integration stage. In simple terms, that means computers that will know how to throttle up the CPU portion of the APU that runs them and dial down the graphics component for scalar processing tasks, while doing the opposite for parallel processing work that's more suited to the GPU.

http://www.pcmag.com/article2/0,2817,2399789,00.asp



The wife wants a laptop to play her rpgs It's easy to see that AMD will provide the best visual experience
She was impressed with Lano A6 ,But I've convinced her to wait for trinity.
looks like 2014-2015 will be very big years for AMD.
 
Status
Not open for further replies.