Collection of AMD K10 data

Page 5 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Barecelona pipeline:
k8lexecutionpipelineye3.jpg

This image is totally insignificant in terms of showing changes in the "Barcelona pipeline." The exact same graph was shown in AMD's 2003 presentation on Opteron.I realy appretiate your input about K8L(K1) in this thread. Thank you for your corrections and suggestions. It would be nice if you can find or draw a graph with "Barcelona pipeline" or any useful data and share it with us. I'll update this thread with info and data. Any ideas or suggestions would be nice.

Thanks.
 
actually, just because it supports 8Gb of memory has nothing to do with the capacity of the modules it is compatible with. well, not literally nothing, but quite little
 
someoe needs to explain the map to me.I got as far as understanding the path is ram to decoder to alu to ram.from there i am confused as to what data goes where.
you're right, it's the basic pipeline: instruction fetch (ram), decode (decoder), execute (alu), commit (ram). everything else is just detail. :wink:
 
From the specifications, I am quite sure K8 (at least rev. F) and later CPUs can handle 8GB per socket. :wink:

This is my happy dance:

happydance.gif


That's great! So I can go off and get my 2xQuad and 4x2GB sticks and live happily ever after! 😀

yeah AMD is the only way u can go with that....965\975 boards dont even support 2 gb modules

explain how i am wrong again? i said it doesnt support 2 GB MODULES... i didnt say it couldnt support 8 gb's

you are wrong here P965 supports up to 8GB of memory
thats 4x2GB

http://www.intel.com/products/chipsets/p965/index.htm


so if there are 4 slots for modules and the board supports 8GB
what is 8GB divided by 4 slots? its 2GB so yes you are wrong

lets pretend there are only 2 slots

whats 8GB divided by 2 slots?

stfu your posts are annoying. know what you are talking about before you talk about it

Rammestein wrote """actually, just because it supports 8Gb of memory has nothing to do with the capacity of the modules it is compatible with. well, not literally nothing, but quite little""""

now tell him he is annoying too .... and post me a link that shows a intel 965/975 running with 2 gig modules.
 
Actually, the diagram for K10 wouldn't be much different from K8. The FP execution units are wider which wouldn't show. And, the new stack hardware wouldn't show nor would the doubled cache buses or doubled prefetch.

The speed that AMD gives for SSE is 3.6X. Since there are four cores instead of two you could divide by 2 and get 1.8 or 80% faster. However, I suppose it would be more accurate to say that widening the SSE units gives a 100% increase in speed while the second set of cores gives the 80%. So, the per core change in SSE should be 100% and this should be the same as C2D. However, because of the additional operations, you can get C2D to run faster if the data requires the right kind of operations and the SSE instructions are tuned correctly. In other words, in general operation K10 and C2D should be about the same at the same clock but specialized SSE operations could be faster on C2D. It is possible that some operations could be a little faster on K10 because of less cache latency but this would only be a small difference.

I was initially thinking about 10% improvement for integer on K10 which would still have been slower than C2D but would have closed most of the gap. This would be due to additional FastPath instructions, elimination of instruction breaking which would require a second fetch, side band stack optimizer, and elimination of fetch delay. However, we also have improvements in branch prediction and out of order loads so I'm now thinking it will be possible to match C2D's integer IPC. I would guess that the great majority of Integer tests would fall within +-5% between C2D and K10. At 3.0Ghz a 3% difference is only 100Mhz. Differences smaller than this are beyond the accuracy of the tests. For example, you can get 2% difference just by changing motherboards.

I have to say though that a bit of editing in the quotes would be nice. It is a bit annoying when someone quotes ten or twenty lines of text and then adds a one line comment. This is doubly annoying when the poster also has a large sig. The jokes about newbie posters who quote large blocks of text and then say, "I agree!" or newbie posters who try to impress people with a large sig have been around for more than 15 years. These jokes actually predate web forums and go back to the old email lists and proprietary dial up bulletin boards. Show everyone that you are actually smart enough to post something worth reading and edit down to the last comment instead of including seven layers of nested quotes. Chain letters are banned for a good reason; the same thing applies to quotes.
 
Well, if AMD said K10 will perform ~40% better than Core2 quads, it HAS GOT to be something around, because if not, there'll be a lot of people willing to set their farm on fire :lol:
 
Here's the latest die shots from Hans de Vries at Chip Architect. Looks like they're scaling cache compared to Brisbane too.

http://www.chip-architect.com/news/2007_02_19_Various_Images.html
Is that table correct? If so, AMD scaled the cache on Barcelona better than Intel did with Merom. I thought AMD was having issues at scaling with the transition from 90nm to 65nm.
 
we dropped thios issue as to not taint this thread...there are no intel systems runnin 2gb modules for 8 gbs ok...

lets keep this thread about the k109
 
what about simple math do you people not get? the system supports up to 8GB of memory
there are 4 memory slots
8/4=2
it supports 2GB per slot
i dont know how i can make this any more clear
Quit hijacking this thread. :?

You would think that someone with well over 1000 posts would know better. :lol: But, I could be wrong.
 
i wasnt hijaking just correcting inaccurate data from someone posting from before.
if you guys want to talk about this stuff at least be accurate.

May I be the one to shoot this dead horse? YES YOU CAN RUN 2GB MEMORY MODULES WITH P965 CHIPSETS. (I'm with beerandcandy on this one)

Now back to Barcelona: I recall a concern Jack once brought up about the separate power planes of the Barcelona chip. It had something to do with the frequency regulation circuits consuming a significant amount of power, and the idea of having multiples of these circuits to be able to slow cores independently may be counter-productive. Is this the case?
 
Anyone hear if K10 solves the problem of having to run 2T when 4 DIMMS are used?
Both the independence of the two memory controllers and a more flexible configuration of them might help here. We'll wait and see.
 
does anyone know if they will release a dual core processor with the same improvements of the Barcelona? aka, better power efficiency, better stepping, and less heat dissipation ect?
 
Yes, there will be a desktop dualcore CPU(codename Kuma) with all the K10 improvements and features. It will be clocked higher than the quadcore and will have more resources per core: more shared L2 per core, ODMC shared between 2 cores, etc.
 
That is a GREAT article..

Finally got a chance to give it a read.

Looks like the rumor of power plane per core is false, likely started by the fact the the Northbridge and the cores (all 4) will have separate planes.

This is shaping up to be a great volley back at Intel...

If the implementation and price are right I think we are looking at a new performance champion. Just a little worried about pricing given the 11 layers and the inclusion of the L3 cache. Is sounding again like binning will be interesting.
 
Looks like the rumor of power plane per core is false, likely started by the fact the the Northbridge and the cores (all 4) will have separate planes.

Can't remember if anything actually said power plane per core (maybe it did somewhere) but at least each core not used will have less power draw. That will help some anyways.

Does anyone have any "not so leaked" info??? I'm sure someone does haha.
 
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2939

Aside from the relatively large size (albiet a quad core), Anand is quoting 11 metalization layers....

This is a record. Making much more difficult to yield over the 9 layer K8.
I think, the most negative effect would be the additional processing steps required. But the designers wouldn't have chosen the additional layers if there wouldn't be any benefit.

But if you say "much more difficult to yield" you imply, that they added more of the finest metalization layers, which are still more prone to defects than the upper layers with increasing size starting from M5 or so.

Already the smallest metal layer structures (vias) are a multiple of the size of e.g. the transistor gate. And the bigger the structures, the smaller is the effect of variations in processing them. A somewhat too thin SiO layer in the transistors can instead render more chips unusable than a vias, which are thinner by the same number of atom layers.
 
The speed that AMD gives for SSE is 3.6X. Since there are four cores instead of two you could divide by 2 and get 1.8 or 80% faster
Am I wrong or when you go from 4p to 2p wont this number be more like 50%. Saying that thru software utilization, crosstalk etc that that doesnt exist in a duo core and therefore the said increases will actually be over 40% ? Or simply put, instead of 80% it going to be 100% ?
 
Typically, designers will increase the number of metalization layers to mitigate the RC delay that is slowly becoming the limiter. This, frankly makes some sense, as the 65 nm process showed slightly larger L2 latency... hinting that the 65 nm process design did not eliminate all the speed paths and RC delay usually goes up for a shrink.
But there are other possible reasons for additional layers. Do you remember the Stanford presentations held by Kevin McGrath and Jerry Moench? I think, it was the latter, who mentioned metal layers dedicated to automated routing of the wires, which sounded like a novum for AMD's CPU design. Intel started to use automated routing heavily for Prescott.

The problem is not adding the metallization layers, this is routine as the nodes go up as stated above. What is problematic is they increased a whole 2 layers within the same node for a new architecture, this is not uncommon but not common either. What this means is that it requires at least 2 more masking steps (more expensive, more masks, more litho tools), but it also means the die are also exposed to 2 more cycles of dielectric and metallization defects, increasing the probability of a killer defect in the backend by roughly 2 out of 9, or 22% over the current probability (which is unknown to the world as AMD nor Intel publish such numbers).

Yield goes down with increasing the number of layers, this is a given.
Additional processing steps surely cost, but as I earlier wrote it depends on the feature size, how much it costs and also how prone to defects these steps are. The most critical dimensions exist in the processing of the transistors and the wiring at this level, while the metallization layers - the higher they are, the larger and simpler are their structures. This also allows to use cheaper mask sets and tools. You can see nice examples of these layers in 2 presentations linked on this page:
http://www.malab.com/