AMD And Nvidia Future Chip Secrets

hella-d

Distinguished
Jan 14, 2006
1,019
0
19,310
PLEASE NOTE: All These Below Are Rumores And Very Possible But Not Proven Yet.

AMD (In Latter 2007): Semprons Will Become Dual-Cores, The Athlon 64 Single-Cores Will Be Phased Out And The X2s And FXs Are Rumored To Be Renamed Accordingly..... Current Name: Athlon 64 X2 to Athlon X2, Athlon 64 FX To Athlon FX, Quad-Core Athlon To Athlon QFX And QFXs Are Rumored To Be Manufactured To Be Installed Into A Socket F As Well As Socket AM2 (That Would Be Interesting Because Then Quad-FX Owners Could Have A Virutal 8-Core "Octo-FX")

Architectual Changes: Rumored To Be "Souped Up" K8 Cores The Changes.... 8 Layer 65nm w/SOI Construction, Doubled L1s (256KB DATA+ 256KB Instruction Per Core) Doubled L2s (512KB Per Core For Semprons, 1MB Per Core For Low-End "Athlon FXs" And 2MB Per Core For High-End "Athlon X2s" & "Athlon FXs" And 1MB Per Core For "Athlon QFXs" All Chips Are Rumored To Have Shared L3s Of 1MB For Semprons, 2MB For Low-End "Athlon X2s" And 4MB For High-End "Athlon X2s", "Athlon FXs" And "Athlon QFXs", There Are Also Several Rumored Core Changes To The Chips....
Doubled L1, L2 And L3 Assosiativity, 16 Stages Per Pipeline VS. 14 Stages For The Current K8s, Single Pass, SSE,SSE2,SSE3,SSE4 And 3Dnow Professional+, 128Bit-Code Cache Pre-Fetching, 1024Bit Cache Pathways, Virtulization Technology, Hypertransport 3.0 (2x 32-Bit Serial Hyper-Transport Pathways @ 1133MHz) And Possible DDR3 Support (The Models With DDR3 And HT3.0 Would Require Yet Another New Socket, However With A BIOS Update They May Take The HT1.0 On Socket AM2 Versions To 1133MHz)These Chips Are Rumored To Be In These Speed Ranges...

Sempron: 1.8GHz - 2.8GHz, Low-End 'Athlon X2": 2.0 - 2.6GHz, High-End 'Athlon X2": 2.6 - 3.0GHz, "Athlon FX": 3.0 - 3.4, And "Athlon QFX": 2.4 - 3.0GHz

Nvidia: It Is Rumored That The 8800GTX And GTSs Will Switch To GDDR4 Graphics Memory And There Will Be A Re-Order Of The Naming Scheme. G82 And G84... G82 Will Most Likely Be A 64 Shader-Unit 8800 Series GPU With A Rumored 450MHz Core And 500MHz (1GHz Effective) 384MB GDDR3, The G84... 32 Shader-Unit 8800 Series GPU With A Rumored 450MHz Core And 400MHz (800MHz Effective) 192MB GDDR3, Rumored Names....

G84: 8700GTX, 8700GTS. G82: 7600GTX And 7600GTS (There Is Rumored Be GTX And GTS Versions Of Both The G84 And G82 Named According To Core And Memory Clocks)

Untill Then Ill Just Stick With What I Have (Not Including My New Fully-Custom Mod Case That Isnt Completed Yet And Is Coming Soon And My Copy Of Windows Vista) My Current Specs Are In My Sig Below.
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
I'll tell you now, G82 will not have 500 MHz GDDR3 (effective 1 GHz), because that speed is so low it is actually under the specficiation of GDDR3 chips.

(It is like underclocking so far the GDDR3 IC will simply not work).

It will most likely have 1.2 GHz GDDR4 (2.4 GHz effective) on either a 256 bit, 320 bit, or 384 bit wide bus.

For 76.8, 96 and/or 115.2 GB/sec of peak throughput from VRAM.

PS: Never trust "The INQ" most their articles are total tripe mate, Some of these rumours would actually make these products slower than stuff I was buying over a year ago. - :oops:
 

Pippero

Distinguished
May 26, 2006
594
0
18,980
PLEASE NOTE:

AMD
Architectual Changes: Rumored To Be "Souped Up" Barton Cores The Changes....
Barton???
Barton core is K7... :lol:

8 Layer 65nm w/SOI Construction, Doubled L1s (256KB DATA+ 256KB Instruction Per Core)
Did you just make this up?
The L1 cache will be halved, not doubled (32KB Data + 32KB Instruction)
However, its bus width will be doubled, so both will be able to provide 256bit per clock (2x128 Loads for the data cache, 32x8 instruction cache)

All Chips Are Rumored To Have Shared L3s Of 1MB For Semprons, 2MB For Low-End "Athlon X2s" And 4MB For High-End "Athlon X2s", "Athlon FXs" And "Athlon QFXs",
L3 for Semprons???
I have serious doubts that there will be any L3 even for X2s, maybe only the FX.
The only chips which should have L3 are the quad cores, and at most up to 2MB.


Doubled L1, L2 And L3 Assosiativity,
How can you double L3 associativity, when there's no L3 currently? :lol:
I didn't hear anything about L2 associativity, but they could double L1 associativity, to make up for the halved size.

16 Stages Per Pipeline VS. 14 Stages For The Current K8s,
No way!!
And i've never heard about changes to the pipeline stages of the new architecture (K8L) as well!
My guess is that they could lengthen the pipeline with the next core revision, somewhere in 2008.

1024Bit Cache Pathways,
???
The L2 to L1 bus is supposed to go to 256bit, and so the buses which interconnect the L1s to the core.

Seriously.. don't come here and post this stuff without any link or evidence.
You're even less accurate than the Inquirer.. :lol:
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
And cache set-associativity will be doubled, so L1 cache hit rate on instructions and data will be similar, perhaps better.

:oops: - Doh, he actually mentioned it in the original post.

As for longer pipelines, that hurts performance normally, unless the extra 2 stages actually do something more useful than counter-useful.
 

hella-d

Distinguished
Jan 14, 2006
1,019
0
19,310
Well, Some Is Just Chat-Room Babble, Others Are From Pepole I Know, And Some Is Just Junk Floating Around On The Net I Simply Posted It Because I Found Them Interesting Not That I Think They Are True (They Could Be) But Who Knows As Of Yet
 

lordaardvark2

Distinguished
Nov 15, 2005
975
0
18,980
so there IS some revision taking place within the amd procs, albeit it is quite debated? my question is, how much more performance would you speculate? yeah, it'd be speculation, i know everyone likes to base these assumptions on numbers, which i totally agree with; but could someone imagine up some speculative numbers just this once?
 

TabrisDarkPeace

Distinguished
Jan 11, 2006
1,378
0
19,280
About +20% more performance per clock, depending on the world load.
(eg: SSE stuff will speed up a lot, Standard apps around +20% per clock, games maybe less depends on level of SSE).

Unfortunately DICE / EA either:
- Haven't figured out how to code a game engine yet (and/or)
- Haven't figured out how to compile so SSE is actually used - at all.

So don't expect a huge jump in Battlefield 2 or the one after it, which I will not by purchasing.

FEAR will most likely get a large jump from K8L, even if on a dual-core.

Still, a 2.67 GHz Core 2 Quad will likely beat out an Athlon 64 X2/X4 clocked at 3.40 GHz. But only just.

However if the Athlon 64 X2/X4 costs 10% less than the Core 2 Quad at 2.67 GHz, it'll be more 'cost effective'.

Once Intel get 45nm out, if they keep supporting 1333 MHz FSB on i965 and i975 (later revions thereof) with a higher core count and 6 - 8 MB of shared L2 cache, they'll have a reall winner on their hands.

45nm will enable very cheap 3.33 GHz to 3.67 GHz Core 2 Quads, which will compete with 3.40 - 3.467 GHz Athlon 64 X2/X4 (K8L derived) systems.

Point is, a 3.33 GHz Core 2 Quad, and an Athlon 64 X4 at 3.467 GHz will provide almost identical performance. - No Joke.

Question is, which one will be cheaper to make ?, and sell for the least ?(TCO of systems and upgrades, not just the CPUs themselves.)

If AMD are only offering Athlon 64 X2 at 3.467 GHz instead, manufactured on a 65nm SOI process, then Intel will provide far better performance potential per dollar. (2 extra cores, made on 45nm, and thus cheaper to produce = better for profit margin = leads to better R&D budget = leads to better cheaps [rinse, wash, repeat]).

AMD need to get 65nm ramped up really quickly, because you just know Intel is going to pounce with 45nm 2 weeks after AMD roll out 3.40 GHz 65nm parts - and say they could've done it 2 months ago but they didn't want to (not cost justified to do until AMD catch up).

AMD either need a 6 issue processor core, with 2 cores (just 2 more advanced cores, vs 4 typical cores, as most software still doesn't truly benefit from 4 cores yet, esp if SSE performance per core can be doubled, or more than doubled), or they need 4 core processors on 65nm SOI economy of scale working in their favour ASAP.

Intel currently have 4 issue cores.
AMD only have 3 issue cores.
Intel used to have 2 - 2.5 issue cores (effectively), but Core 2 Duo changed all that.

If AMD had a 65nm SOI dual-core, with advanced 6-issue cores, they'd only need to clock it at 2.2 GHz to beat Intel in IPC (again). But the cores would be so large having 4 of them may not be possible with good AMD fab yields, However each of the more advanced cores could be 2-way SMT. Ironically this would permit them to best Intel, using a system that Intel used in the past, but that AMD didn't. (Which isn't all that different from Intel using better IPC on AMD, which was AMD's idea to counter Intel originally).

This 'large' 65nm SOI dual-core, 2-way SMT per core, 4 threads per die could perform on par, or better than, the Intel Core 2 Quad at 4.0 GHz while clocked substantially lower.

However, I do not think AMD are learning in this direction (although they should be).
 

lordaardvark2

Distinguished
Nov 15, 2005
975
0
18,980
About +20% more performance per clock, depending on the world load.
(eg: SSE stuff will speed up a lot, Standard apps around +20% per clock, games maybe less depends on level of SSE).

Unfortunately DICE / EA either:
- Haven't figured out how to code a game engine yet (and/or)
- Haven't figured out how to compile so SSE is actually used - at all.

So don't expect a huge jump in Battlefield 2 or the one after it, which I will not by purchasing.

FEAR will most likely get a large jump from K8L, even if on a dual-core.

Still, a 2.67 GHz Core 2 Quad will likely beat out an Athlon 64 X2/X4 clocked at 3.40 GHz. But only just.

However if the Athlon 64 X2/X4 costs 10% less than the Core 2 Quad at 2.67 GHz, it'll be more 'cost effective'.

Once Intel get 45nm out, if they keep supporting 1333 MHz FSB on i965 and i975 (later revions thereof) with a higher core count and 6 - 8 MB of shared L2 cache, they'll have a reall winner on their hands.

45nm will enable very cheap 3.33 GHz to 3.67 GHz Core 2 Quads, which will compete with 3.40 - 3.467 GHz Athlon 64 X2/X4 (K8L derived) systems.

Point is, a 3.33 GHz Core 2 Quad, and an Athlon 64 X4 at 3.467 GHz will provide almost identical performance. - No Joke.

Question is, which one will be cheaper to make ?, and sell for the least ?(TCO of systems and upgrades, not just the CPUs themselves.)

If AMD are only offering Athlon 64 X2 at 3.467 GHz instead, manufactured on a 65nm SOI process, then Intel will provide far better performance potential per dollar. (2 extra cores, made on 45nm, and thus cheaper to produce = better for profit margin = leads to better R&D budget = leads to better cheaps [rinse, wash, repeat]).

AMD need to get 65nm ramped up really quickly, because you just know Intel is going to pounce with 45nm 2 weeks after AMD roll out 3.40 GHz 65nm parts - and say they could've done it 2 months ago but they didn't want to (not cost justified to do until AMD catch up).

AMD either need a 6 issue processor core, with 2 cores (just 2 more advanced cores, vs 4 typical cores, as most software still doesn't truly benefit from 4 cores yet, esp if SSE performance per core can be doubled, or more than doubled), or they need 4 core processors on 65nm SOI economy of scale working in their favour ASAP.

Intel currently have 4 issue cores.
AMD only have 3 issue cores.
Intel used to have 2 - 2.5 issue cores (effectively), but Core 2 Duo changed all that.

If AMD had a 65nm SOI dual-core, with advanced 6-issue cores, they'd only need to clock it at 2.2 GHz to beat Intel in IPC (again). But the cores would be so large having 4 of them may not be possible with good AMD fab yields, However each of the more advanced cores could be 2-way SMT. Ironically this would permit them to best Intel, using a system that Intel used in the past, but that AMD didn't. (Which isn't all that different from Intel using better IPC on AMD, which was AMD's idea to counter Intel originally).

This 'large' 65nm SOI dual-core, 2-way SMT per core, 4 threads per die could perform on par, or better than, the Intel Core 2 Quad at 4.0 GHz while clocked substantially lower.

However, I do not think AMD are learning in this direction (although they should be).

wow, man, thanks for such a comprehensive response. that was great! :trophy: :trophy:

so then, as you referred to it in your post, would these revised procs be k8l? or would the be a go-between, a patch for the hole in the levee that is c2d?
 

Pippero

Distinguished
May 26, 2006
594
0
18,980
AMD either need a 6 issue processor core, with 2 cores (just 2 more advanced cores, vs 4 typical cores, as most software still doesn't truly benefit from 4 cores yet, esp if SSE performance per core can be doubled, or more than doubled), or they need 4 core processors on 65nm SOI economy of scale working in their favour ASAP.
Sorry, but it doesn't work that way.
First, people confuses "n-issue" with "n-scalar".
Being able to issue "n" instructions per clock, does not mean being able to execute and retire "n" instructions per clock (that would be "n-scalar"), it just means that the CPU can start to begin the execution of "n" instructions per clock by putting them into the queues of the hw scheduler.
Second and most important, there are good reasons why a 6-scalar CPU does not exist, nor is planned by AMD or Intel.
The problem is the amount of parallelism available in the code.
What this means is that, in order to take advantage of a 6-scalar CPU, you must always find 6 instructions which are independent from each other, plus there are no dependencies from branches (control dependencies) and no stalls due to memory access (cache miss).
This is generally not possible with current code, except in a few specific cases, like when manipulating vector data, but this is processed much more efficiently with SIMD instructions (Single Instructions Multiple Data) such as SSE1-2-3-4.
That's why the industry is moving away from ILP (Instruction Level Parallelism) towards Thread Level Parallelism.


Intel currently have 4 issue cores.
AMD only have 3 issue cores.
Intel used to have 2 - 2.5 issue cores (effectively), but Core 2 Duo changed all that.
Hmm but Netburst was also theoretically 4-issue.
If you look at the functional units of C2D and K8, you'll notice that in terms of execution resources, C2D is not really wider than K8, i.e. they can roughly execute the same number of instructions per clock, even though C2D can issue more.
With the big exception in favor of C2D of the SSE engine, which has a doubled throughput compared to K8; another obvious advantage is the Load/Store unit, which in case of K8 is mostly in-order (i.e. a stalling Load blocks the whole pipeline) while C2D can perform out-of-order loads and stores.
And of course, C2D has a much higher cache bandwidth.
Now K8L will also double the throughput of SSE and the cache bandwidth, plus it will introduce OOO loads.
So basically the two architectures seem to be on par in terms of execution resources; sure C2D can issue more, but since it can't process the issued data faster, then it just means that more data will sit in the scheduler's queues.
Then a higher issue rate is useless?
No, this comes handy in 2 circumstances:
* cache miss: waiting for data from memory, the pipeline has to stall for tens or hundreds of clocks, then the schedulers queues get empty as old instructions get completed, while waiting for the missing data. As soon as the data is there, having a higher issue rate means that the pipeline queues will refill more quickly, hence the CPU will start to crunch data sooner
* branch misprediction: so we gambled the branch outcome and we turned out to be wrong.. ouch! What we have to do, is cancel the wrongly issued instructions, and start to fetch the correct ones.. even here, faster issue rate means saving some cycles
Now, the branch misprediction is much less of an issue for moderate sized pipelines like C2D and K8L (but with 30 stages Netburst it would be a bigger problem), except in branch intensive applications. The other part of the equation thought is th branch prediction success rate, and here is hard to speculate which is better between C2D and K8L, even though it is announced that K8L will have better prediction than K8.
In case of a cache miss, C2D has the advantage of the issue rate, but K8L will compensate with the low latency Integrated Memory Controller. So again is a though call.
There is more to the performance equation than this, but i'm still collecting information and waiting for more details about K8L to emerge, then i'll start a thread on the topic trying to make a better speculation about K8L's performance.
Right now, my guess is that K8L and C2D will be very similar in performance clock for clock.
However, AMD will not regain the desktop performance crown in 2007, because of the clock speed handicap.
We will not see a K8L clocked at 3.46GHz as you have speculated (while Intel will, at 45nm), K8L is expected to debut at 2.9GHz max, and AMD's 65nm initially will not clock higher than their current 90nm, as it is indicated by their current roadmaps.
I expect K8L to be extremely competitive in it's quad core form as an Opteron CPU, thanks to the higher bandwidth and the "native" quadcore design, which benefits from the L3 cache and dedicated buses for intercore communication.
AMD could try to fight for the performance crown in 2008, if they succeed with their 45nm process and the core will be (further) updated, hopefully with changes to the pipeline to increase the clock frequency.