Source:
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2748
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2939
Here are some of the differences between Intel's Core Architecture and AMD's next generation architecture (NGA) called K8L or K10, with the reference of AMD K8 architecture.
1. Processor manufacturing technology:
Core Arch.: 65nm, 45nm in 2007 H2, 8 metal layers
K8: 130nm SOI, 90nm SOI, 65nm SOI, 9 metal layers
NGA: 65nm SOI, 45nm in mid-2008, 11 metal layers
2. Cache system
Core Arch:
L1 cache: 32KB, 8-way, latency: 3 cycles
L2 cache: 2-4MB shared, 16-way, 256-bit (64GB/s at 2GHz), latency: 11-20 cycles
L3 cache: absent
K8:
L1 cache: 64KB data+64KB instruction, 2-way, latency: 3 cycles
L2 cache: 512KB, 16-way, 128-bit (32GB/s at 2GHz), latency: 11-20 cycles (90nm version)
L3 cache: absent
NGA:
L1 cache: 64KB data+64KB instruction, 2-way, latency: 3 cycles
L2 cache: 512KB, 16-way, 128-bit (32GB/s at 2GHz), latency: unknown
L3 cache: 2MB shared, 32-way, unknown width and latency
3. x86 decoding ability
Core Arch:
x86 decoders: 3 simple + 1 complex (the complex decoder can decode 2 simple codes in a pass)
Out-of-order execution buffer: 96 instructions
K8:
x86 decoders: 3 complex
Out-of-order execution buffer: 72 instructions
NGA:
x86 decoders: 3 complex
Out-of-order execution buffer: 72 instructions with improvements
4. ALU, FPU and SSE units
Core Arch:
ALU units: 3
Maximum dual-precision (64-bit) FP per cycle: 4
SSE units: 3 units, 128-bit
K8:
ALU units: 3
Maximum dual-precision FP per cycle: 3
SSE units: 2 units, 64-bit
NGA:
ALU units: 3
Maximum dual-precision FP per cycle: 3
SSE units: 2 units, 128-bit
5. Pre-fetch and other tune-ups
Core Arch:
Out-of-order loads: Present
Stack manager: Present
Pre-fetchers: 2 data, 1 instruction (to core), 2 pre-fetchers (to L2 cache)
Instruction fetch width: 24 byte per cycle
K8:
Out-of-order loads: Absent
Stack manager: Absent
Pre-fetchers: 1 data, 1 instruction (to L2 cache)
Instruction fetch width: 16 byte per cycle
NGA:
Out-of-order loads: Present
Stack manager: Present
Pre-fetchers: 1 data, 1 instruction (to L1 cache), 1 DRAM pre-fetcher (to dedicated buffer)
Instruction fetch width: 32 byte per cycle
6. Memory controller
Core Arch: absent
K8: 1x128-bit memory controller (1 operation per cycle)
NGA: 2x64-bit memory controller with NGMA (max 2 operations per cycle)
7. Power management
Core Arch: EIST (min. x6 multiplier), switch off transistor when not in use
K8: Cool'n'Quiet (min. x5 multiplier)
NGA: improved C'n'Q, two separate power planes for crossbar and cores, separate clocks for each core
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2748
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2939
Here are some of the differences between Intel's Core Architecture and AMD's next generation architecture (NGA) called K8L or K10, with the reference of AMD K8 architecture.
1. Processor manufacturing technology:
Core Arch.: 65nm, 45nm in 2007 H2, 8 metal layers
K8: 130nm SOI, 90nm SOI, 65nm SOI, 9 metal layers
NGA: 65nm SOI, 45nm in mid-2008, 11 metal layers
2. Cache system
Core Arch:
L1 cache: 32KB, 8-way, latency: 3 cycles
L2 cache: 2-4MB shared, 16-way, 256-bit (64GB/s at 2GHz), latency: 11-20 cycles
L3 cache: absent
K8:
L1 cache: 64KB data+64KB instruction, 2-way, latency: 3 cycles
L2 cache: 512KB, 16-way, 128-bit (32GB/s at 2GHz), latency: 11-20 cycles (90nm version)
L3 cache: absent
NGA:
L1 cache: 64KB data+64KB instruction, 2-way, latency: 3 cycles
L2 cache: 512KB, 16-way, 128-bit (32GB/s at 2GHz), latency: unknown
L3 cache: 2MB shared, 32-way, unknown width and latency
3. x86 decoding ability
Core Arch:
x86 decoders: 3 simple + 1 complex (the complex decoder can decode 2 simple codes in a pass)
Out-of-order execution buffer: 96 instructions
K8:
x86 decoders: 3 complex
Out-of-order execution buffer: 72 instructions
NGA:
x86 decoders: 3 complex
Out-of-order execution buffer: 72 instructions with improvements
4. ALU, FPU and SSE units
Core Arch:
ALU units: 3
Maximum dual-precision (64-bit) FP per cycle: 4
SSE units: 3 units, 128-bit
K8:
ALU units: 3
Maximum dual-precision FP per cycle: 3
SSE units: 2 units, 64-bit
NGA:
ALU units: 3
Maximum dual-precision FP per cycle: 3
SSE units: 2 units, 128-bit
5. Pre-fetch and other tune-ups
Core Arch:
Out-of-order loads: Present
Stack manager: Present
Pre-fetchers: 2 data, 1 instruction (to core), 2 pre-fetchers (to L2 cache)
Instruction fetch width: 24 byte per cycle
K8:
Out-of-order loads: Absent
Stack manager: Absent
Pre-fetchers: 1 data, 1 instruction (to L2 cache)
Instruction fetch width: 16 byte per cycle
NGA:
Out-of-order loads: Present
Stack manager: Present
Pre-fetchers: 1 data, 1 instruction (to L1 cache), 1 DRAM pre-fetcher (to dedicated buffer)
Instruction fetch width: 32 byte per cycle
6. Memory controller
Core Arch: absent
K8: 1x128-bit memory controller (1 operation per cycle)
NGA: 2x64-bit memory controller with NGMA (max 2 operations per cycle)
7. Power management
Core Arch: EIST (min. x6 multiplier), switch off transistor when not in use
K8: Cool'n'Quiet (min. x5 multiplier)
NGA: improved C'n'Q, two separate power planes for crossbar and cores, separate clocks for each core