AMD Piledriver rumours ... and expert conjecture

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
We have had several requests for a sticky on AMD's yet to be released Piledriver architecture ... so here it is.

I want to make a few things clear though.

Post a question relevant to the topic, or information about the topic, or it will be deleted.

Post any negative personal comments about another user ... and they will be deleted.

Post flame baiting comments about the blue, red and green team and they will be deleted.

Enjoy ...
 
I have already deleted any posts ( 10 ) without some connection to Bulldozer's successor ... or suggestions of a shrink of a previous K series CPU.

Swing the topic back to Piledriver please?

How about putting up some roadmaps?

Anyone popped over to see what Charlie D or Anand or Fuad has found?

Lets pool resources and display them here.

Hopefully JF-AMD will return soon - as he had heavy work commitments with the Interlagos release.

 
GPU performance increase touted as around 30% as the new 7 series GPU will be used ... as opposed to the current 8 series (read 5 series) GPU.

CPU performance would surely increase ... how much is unknown.

Just getting a decrease in cache latency and a VCE will probably yield a good return though an increase in L1 Cache size ...


http://www.forum-3dcenter.org/vbulletin/showthread.php?p=8935242#post8935242

Engineering samples are likely out of the oven even now.

I didn't troll for more info on the destop variants.

I would imagine a better stepping for the desktops might be out shortly.

 
Increased L1 DTLB size from 32 entries to 64 entries

http://support.amd.com/us/Processor_TechDocs/47414.pdf (Bulldozer)

AMD Family 15h processors have multiple compute units, each containing its own L2 cache and two cores. The cores share their compute unit’s L2 cache. Each core incorporates the complete x86 instruction set logic and L1 data cache. Compute units share the processor’s L3 cache and Northbridge (see Chapter 2, Microarchitecture of AMD Family 15h Processors).

16KB per cluster L1 caches

http://www.donanimhaber.com/islemci/haberleri/Dunyada-ilk-defa-AMDnin-ikinci-nesil-Bulldozer-mimarisi-icin-ilk-detaylar.htm
 
Piledriver really needs a triple or quad channel memory controller. Unfortunately that doesn't look like it will happen. As we saw with the Llano APU the DDR3 speed has significant effect on the graphics performance.

Beefing up the GPU 50% and adding Bulldozer cores is really going to squeeze the memory bus. DDR3 2133 will help but it's expensive.
 
The CPU side would not benefit much but the graphics performance of Llano is fairly sensitive to bandwidth.

I think some are mixing discussion of the CPU vs the APU which is understandable since the next APU will be using PD cores as will 'FX-next'.
 
To my knowledge, Trinity has always been set for PD cores.
Charlie saw one awiles back, and its unconfirmed as to whether itll only be VLIW4 only, with no GCN added.
Its possible I suppose, but adding several designs, besides just dumb shrinks stretches their already stretched capacity, unless it works in their favor?
 
Interesting. Since BD IPC was actually lower than the prior core used on Llano, they must be banking on a significant core frequency bump on Trinity-- either from Turbo 3 or just better power/thermals.
 
http://vr-zone.com/articles/report-amd-trinity-details-revealed/13807.html

http://news.softpedia.com/news/AMD-Confirms-Trinity-APU-Will-Launch-in-Early-2012-230779.shtml

It seems Piledriver is not getting much attention,( another BD?) however, with 2012 around the corner, Trinity will begin to appear. This is odd because Llano has been out for a short time already.

I have a feeling AMD will try to push out APUs at nearly the same rate that they do GPUs since the two are probably sharing some of the design cycle. I think the ex-AMD engineer claimed that is why they moved to machine generated layouts so they could get the GPU and CPU sides on similar development schedules.
 
It seems Piledriver is not getting much attention,( another BD?) however, with 2012 around the corner, Trinity will begin to appear. This is odd because Llano has been out for a short time already.


That makes sense as AMD's mantra is 'The Future is Fusion." The mainstream market is the APU and it's where they are making the most money.
 
Uh,oh...

http://www.pcmag.com/article2/0,2817,2395447,00.asp

I so called it. AMD is having 32nm yield issues across the board, not just Llano. Llano has a bit more mainly due to the GPU, probably not the CPU.

Looks like low supply for BD until GF gets 32nm mature.

GPU performance increase touted as around 30% as the new 7 series GPU will be used ... as opposed to the current 8 series (read 5 series) GPU.

CPU performance would surely increase ... how much is unknown.

Just getting a decrease in cache latency and a VCE will probably yield a good return though an increase in L1 Cache size ...


http://www.forum-3dcenter.org/vbulletin/showthread.php?p=8935242#post8935242

Engineering samples are likely out of the oven even now.

I didn't troll for more info on the destop variants.

I would imagine a better stepping for the desktops might be out shortly.

As I said before, I would assume 10% increase if Trinity doesn't follow BDs suit. The 30% GPU I would imagine would be due to higher clocks.

Wish AMD would put more info out there. Kinda annoying to announce something with very little info on it.
 
On the L3 approach to it...

I got an A8 and I can tell you guys it performs very very well for it'slow speed. Been experimenting a lot with it and it can max out my 4890 (which would be near a 6850) running at 2.9Ghz. Hey, the Athlon II's are a sample for them too 😛

There are few apps that actually need L3 cache that bad, so it's not a bad trade off at all for more space to get GPU muscle in.

On the arch itself, I haven't seen any diagram with tweaks so far that makes it differ from Zambezi, so the performance should be in the same ballpark if not the actual same.

Cheers!

EDIT: Deleted a word.
 
Wow, interesting posts I guess. From what I've seen, PD\Trinity will pick up where B3 leaves off. PD is set to get FMA3 and some bit manip instructions.

I've been in search of an errata list because it's been said that two threads sharing a module tend to get the wrong prefetch data which definitely causes thrashing and will really test the OoO engine for mispredictions. You could lose 5% efficiency which will translate to much more perf.

Also, the affinity tests are showing more perf than it should by not sharing L2. I do believe that a fix in PD\Trinity will be to tweak the L2 WCC (write coalescing cache) to align better for each INT unit.

That I believe is causing the low L2 bandwidth. More to come
 
I so called it. AMD is having 32nm yield issues across the board, not just Llano. Llano has a bit more mainly due to the GPU, probably not the CPU.

Looks like low supply for BD until GF gets 32nm mature.



As I said before, I would assume 10% increase if Trinity doesn't follow BDs suit. The 30% GPU I would imagine would be due to higher clocks.

Wish AMD would put more info out there. Kinda annoying to announce something with very little info on it.


That article doesn't say there are BD yield issues. Everyone KEEPS saying it's not BD but Llano. IF they are wasting wafers there are fewer for FX.

As far as Trinity:

amdtrinitydetay_2_dh_fx57.jpg



Also the word is that UVD is getting a new feature to challenge QuickSync. Searching for the reference. It's also possible that Trinity is better because the GPU is already 28nm so it's actually the opposite of a shrink and has the simpler VLIW4 arrangement. Also, they can optimize compilers more for LWP (Lightweight Profiling) and XOP\AVX.

Again the problems with X were caused by a perfect storm of SW optimizations - or lack of and Win 7 no understanding modules. Of PD\Trinity can't fix that but I'm pretty certain of the issues I believe are occurring in L1 and L2. FX is the first Shared L2 arch. PD should tweak the Write Through and eviction scheme to increase bandwidth.


It's really difficult to get to in-depth with what they may have done with PD, but I don't think they will get 20% more clock speed at this point. I would think that if they got the first Trinity chips back in June, they have probably finalized max clocks.

BTW, I see everyone started right back in with delete worthy posts. Perhaps I'll come to a SB post and set a good example.
 
Status
Not open for further replies.