Why is AMD (Athlon) per clock cycle faster than Intel (P4)?

aleric

Distinguished
Jun 21, 2004
12
0
18,510
I would really like to understand how it is
possible that my Pentium-4 1.7Ghz is only 1.12 times
faster than my Athlon 900. Both have more or less
the same amount of memory - both run the same OS
(debian testing - both up to date).

I already posted this in an unrelated thread in the
motherboard section but I guess it really belongs here:

--------------------------------------------------------------------------------
Benchmark:
- Compile time ('time make') of libcwd-0.99.45 after a './configure --enable-maintainer-mode -disable-pch'
compiler: gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
(OS: debian 'testing' (Lenny) at Apr 26, 2007).

System 1:
- model name : AMD Athlon(tm) Processor
cpu MHz : 908.119
cache size : 256 KB
bogomips : 1818.08
MemTotal: 906592 kB
Diskspeed:
Timing cached reads: 272 MB in 2.00 seconds = 135.85 MB/sec
Timing buffered disk reads: 148 MB in 3.02 seconds = 49.04 MB/sec

System 2:
- model name : Intel(R) Pentium(R) 4 CPU 1.70GHz
cpu MHz : 1708.705
cache size : 256 KB
bogomips : 3420.70
MemTotal: 1036664 kB
Diskspeed:
Timing cached reads: 628 MB in 2.00 seconds = 313.39 MB/sec
Timing buffered disk reads: 182 MB in 3.00 seconds = 60.61 MB/sec

'vmstat 1' shows that during compilation 100% cpu is being used,
and both, id(le) and wa(it for IO), are constantly 0. Hence, we are
not measuring diskspeed here - but cpu speed.

Results:

The Althon 900 compiles libcwd in 2 minutes and 5 seconds.
The Pentium-4 1.7 GHz does the same job in 1 minute 53 seconds.

Conclusion: the pentium is only 1.12 times faster, despite that it's nearly
double clock frequency.

What is causing this?

Edit: tried to change the topic (was: Why is AMD faster than Intel?)
 

dragonsprayer

Splendid
Jan 3, 2007
3,809
0
22,780
dude who cares?

why is a pinto faster then a chevette? who cares!

u need to upgrade!


the p4 takes 33 steps to make 2 calcutaions per cycle while the amd takes 24 steps to make 3! ok!

now upgrade!
 

jeff_2087

Distinguished
Feb 18, 2007
823
0
18,980
Because clockspeed isn't an indicator of relative performance between different architectures. When I leave work today I'd rather be driving 40 mph instead of 50 kph. Note that 50 is a bigger number than 40.
 

ElMoIsEviL

Distinguished
Catchy topic no? ;)

Seriously - I would really like to understand how it is
possible that my Pentium-4 1.7Ghz is only 1.12 times
faster than my Athlon 900. Both have more or less
the same amount of memory - both run the same OS
(debian testing - both up to date).

I already posted this in an unrelated thread in the
motherboard section but I guess it really belongs here:

--------------------------------------------------------------------------------
Benchmark:
- Compile time ('time make') of libcwd-0.99.45 after a './configure --enable-maintainer-mode -disable-pch'
compiler: gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
(OS: debian 'testing' (Lenny) at Apr 26, 2007).

System 1:
- model name : AMD Athlon(tm) Processor
cpu MHz : 908.119
cache size : 256 KB
bogomips : 1818.08
MemTotal: 906592 kB
Diskspeed:
Timing cached reads: 272 MB in 2.00 seconds = 135.85 MB/sec
Timing buffered disk reads: 148 MB in 3.02 seconds = 49.04 MB/sec

System 2:
- model name : Intel(R) Pentium(R) 4 CPU 1.70GHz
cpu MHz : 1708.705
cache size : 256 KB
bogomips : 3420.70
MemTotal: 1036664 kB
Diskspeed:
Timing cached reads: 628 MB in 2.00 seconds = 313.39 MB/sec
Timing buffered disk reads: 182 MB in 3.00 seconds = 60.61 MB/sec

'vmstat 1' shows that during compilation 100% cpu is being used,
and both, id(le) and wa(it for IO), are constantly 0. Hence, we are
not measuring diskspeed here - but cpu speed.

Results:

The Althon 900 compiles libcwd in 2 minutes and 5 seconds.
The Pentium-4 1.7 GHz does the same job in 1 minute 53 seconds.

Conclusion: the pentium is only 1.12 times faster, despite that it's nearly
double clock frequency.

What is causing this?

First things first... the title of your thread is misleading.

AMD is not faster then Intel. And Intel is not faster then AMD.

But to answer your question K7 is faster per clock (higher IPC) then Netburst. K7 = AMD Athlon, Netburst = Intel P4.

the main difference between these two architectures is that the Netburst architecture (P4) contains very long pipelines. The affect of such long pipelines means that the processor can attain a much higher working frequency, the negative side affect is that it also limits the ability of the processor to do as much work per clock cycle then a competing processor with a shorter pipeline.

In the end, Netburst defeated K7, albeit doing so with a much higher working frequency. Their last battle was the Athlon XP 3200+ vs. the Pentium 4C 3.2GHz HT. The latter won.
 

1Tanker

Splendid
Apr 28, 2006
4,645
1
22,780
you hadnt posted that when i was replying yet 8O damn ,beat to the draw :wink: your explanation is clearer.
Better dig out that old copy of "Mavis Beacon...." Vern...slow typing will bury you in these bloodthirsty Forumz. :tongue: :D
 

1Tanker

Splendid
Apr 28, 2006
4,645
1
22,780
you hadnt posted that when i was replying yet 8O damn ,beat to the draw :wink: your explanation is clearer.
Better dig out that old copy of "Mavis Beacon...." Vern...slow typing will bury you in these bloodthirsty Forumz. :tongue: :D

I thought the Hundt and pehck method was king here. :pworks for me. :oops: , but sometimes it's gets my head spinning. :D
 

aleric

Distinguished
Jun 21, 2004
12
0
18,510
AMD is not faster then Intel. And Intel is not faster then AMD.

But to answer your question K7 is faster per clock (higher IPC) then Netburst. K7 = AMD Athlon, Netburst = Intel P4.

the main difference between these two architectures is that the Netburst architecture (P4) contains very long pipelines. The affect of such long pipelines means that the processor can attain a much higher working frequency, the negative side affect is that it also limits the ability of the processor to do as much work per clock cycle then a competing processor with a shorter pipeline.

In the end, Netburst defeated K7, albeit doing so with a much higher working frequency. Their last battle was the Athlon XP 3200+ vs. the Pentium 4C 3.2GHz HT. The latter won.

Thank you for the excellent answer :D

Of course I agree with most of the others that these cpu's aren't
very interesting anymore-- but the reason I asked IS because I
want to buy a new PC and have to decide between AMD and intel.

It's a fact that both give cpu frequencies (apart from the number of
cores) somewhere between 2 and 3 GHz. If Intel still has these long
pipelines, then should I conclude that a single core at 2.8GHz from Intel
is a lot slower (for compilation of C++ programs thus) than a
single core at 2.8GHz from AMD? Probably not or Intel would be out
of business ;), but then I'd really like to know who changed their
strategy: If both are now equally fast with the same clock frequency,
then is that because Intel shortened their pipelines? Or has AMD enlarged
them?
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
dude who cares?

why is a pinto faster then a chevette? who cares!

u need to upgrade!


the p4 takes 33 steps to make 2 calcutaions per cycle while the amd takes 24 steps to make 3! ok!

now upgrade!
8O I can forgive you for the 12-step pipeline of a K7 Athlon, but an Intel fan like you that does not know the famous Prescott/Cedar Mill pipeline has 31 steps?!
 

will14

Distinguished
Aug 3, 2006
606
0
19,010
Each processor family is different.
For example.
Athlon/AthlonXP/Athlon64/AM2
Each is progressively faster, and I imagine effecient although originally speed/performance was mostly expressed in mega/gigahertz Mhz/Ghz

Now you saw Intel/pentiums
Pentium 1 2 3 4 then a lot of slightly different 4's.
Then pentium D and now Core 2, next will be Barcelona/Agena/Phenom(name TBD).
Will the pentium 4's they would re-release under different chipsets and the clockspeed would fall say their max was at 3.2.
They would release a new model at 2.6 which would beat the 3.2
How did this work? Efficiency and as mentioned earlier Instructions per cycle(IPC). Imagine wheels and gears. You turn a wheel 500 times, if you turn a really small gear you get less accomplished but if you turn a large gear you get more done per turn. The same can be applied with processer and Hz(Mhz/Ghz).

Say the pentium 4 had 10 cycles per clock and now the Core 2 has 20 for example(not actual).

Pentium 4: 10x 3200 32,000 Instructions

Core 2: 20x2000 40,000 Instructions

See the Core 2 is faster with a lower clock.

Now for IPC current looks something like

Pentium 4<AM2<Core2<Barcelona(K10)???Penryn

We don't know about K10 or Penryn yet though.
Safe to assume Barcelona will beat core 2 but we haven't a clue how it will do versus Penryn.

Things such as the path/steps can determine IPC, but I wouldn't worry about them now. In the end benchmarks, price and personal preferance should be your guide.

Hope this helps.

~Will
 

futurelic

Distinguished
Feb 12, 2006
28
0
18,530
AMD/ati systems cost less than intel based especially if you buy the components and assemble them your self
the performance issues between processors could be made up by your graphics choice, memory, and hard-drive speed.

Didn't amd help push intel to build a better product?
 
8O I can forgive you for the 12-step pipeline of a K7 Athlon, but an Intel fan like you that does not know the famous Prescott/Cedar Mill pipeline has 31 steps?!

The P4 1.7 that the OP has is a Williamette as there were no 1.7 GHz P4s that weren't Willies. There were 1.6A and 1.8A Northwoods, but no 1.7s. The Willy has 20 pipeline stages :D
 

m25

Distinguished
May 23, 2006
2,363
0
19,780
Yes, and the 20-stage-pipeline P4s have a better IPC than 31-stage ones. altogether, the P4 wouldn't have been that much of a fiasco had they sticked to the 20 stage pipeline; they wanted hyperpipelining and it went right into their a** :D
 

fidgewinkle

Distinguished
Feb 27, 2007
162
0
18,680
AMD is not faster then Intel. And Intel is not faster then AMD.

But to answer your question K7 is faster per clock (higher IPC) then Netburst. K7 = AMD Athlon, Netburst = Intel P4.

the main difference between these two architectures is that the Netburst architecture (P4) contains very long pipelines. The affect of such long pipelines means that the processor can attain a much higher working frequency, the negative side affect is that it also limits the ability of the processor to do as much work per clock cycle then a competing processor with a shorter pipeline.

In the end, Netburst defeated K7, albeit doing so with a much higher working frequency. Their last battle was the Athlon XP 3200+ vs. the Pentium 4C 3.2GHz HT. The latter won.

Thank you for the excellent answer :D

Of course I agree with most of the others that these cpu's aren't
very interesting anymore-- but the reason I asked IS because I
want to buy a new PC and have to decide between AMD and intel.

It's a fact that both give cpu frequencies (apart from the number of
cores) somewhere between 2 and 3 GHz. If Intel still has these long
pipelines, then should I conclude that a single core at 2.8GHz from Intel
is a lot slower (for compilation of C++ programs thus) than a
single core at 2.8GHz from AMD? Probably not or Intel would be out
of business ;), but then I'd really like to know who changed their
strategy: If both are now equally fast with the same clock frequency,
then is that because Intel shortened their pipelines? Or has AMD enlarged
them?

Comparing P4 versus K7 is not relavent to modern processors. Go look at THG benchmarks of the processors currently out there to get an idea as to how AMD compares to Intel.
 

aleric

Distinguished
Jun 21, 2004
12
0
18,510
This is the reason the Athlon performs better than expected relative to the Pentium (P4) in your benchmarks. The Athlon simply completes or executes more instructions in each tick of the clock compared to the P4.

So, you are sure it is because the Athlon does more instructions
in parallel (higher IPC)? Before, I thought it was caused by cache
misses: because the pipeline is longer, a cache miss is more relevant
because more 'work' has to be thrown away. So, I thought, the pentium4
would be better at applications with a "burst" like nature, like video
processing, and worse when it has to make a lot of fast decisions
that can't be known before hand (lots of branches).

If that is the case, that the performance is application dependend,
then my main problem is (not was) that none of the used benchmarks
on THG are about compilation. It's about 3D (graphics), which is pure
bursting data to the video card imho (and a lot of floating point
calculations of course). The same for mp3 encoding: a lot of calculations,
but not about (integer type) branches. Then there are several windows
applications that I have no clue of what they do (I never used windows),
let alone that I can guess what type of load they are for the processor.

If THG had benchmarks "compiling blahblah on linux (64 bit); faster is
better", then I could actually use the benchmarks to make my decision.

Now, I think: well - those numbers are great. But if the performance
of the Intel chips fall back a factor of two when I use them to compile
something, then I don't want to use these results!

This is the main reason that I asked my question: I hope(d) to understand,
on the most detailed, technical level what is causing the difference in
IPC while compiling -- and then I hope to be able to use that knowledge
to understand how to interpret the current benchmarks.

Alternatives are:
1) Someone tells me that 'this or that' benchmark gives
the same ratios (between each cpu) as compilation does (ie, it is the
same type of application, and leads to the same IPC).
2) Someone tells me that the IPC is not (or HARDLY) a function of
the benchmark/application (this can't be true however, because then
each benchmark should show the same winner(?)) and the IPC is
constant.
3) Someone tells me that whatever the difference was that caused
the difference that I am observing between the Athlon 900 and the
P4 does no longer exist, because both architectures now use the same
approach.

Finally, contrary to what some people tell me in this thread, it *is*
important to me to compare the Athlon 900 with the Intel core 2 Extreme
QX6700 (for example), because I currently own the Athlon 900 and I
am thinking about buying a new PC: I want to know how much faster
my new PC will finish the programs that I run. Now I know that it takes
2 minutes and 5 seconds to run a full 'make' on libcwd-0.99.45 after
configuration with --enable-maintainer-mode --disable-pch. I won't
buy a new PC (not worth the money) if the new PC won't be faster
for exactly that than 25 seconds. If will definitely not wait longer
and order it this week if I know it will compile it in 12.5 seconds or less.
At the moment I have NO clue how fast it will be :(
 

qcmadness

Distinguished
Aug 12, 2006
1,051
0
19,280
This is the reason the Athlon performs better than expected relative to the Pentium (P4) in your benchmarks. The Athlon simply completes or executes more instructions in each tick of the clock compared to the P4.

So, you are sure it is because the Athlon does more instructions
in parallel (higher IPC)? Before, I thought it was caused by cache
misses: because the pipeline is longer, a cache miss is more relevant
because more 'work' has to be thrown away. So, I thought, the pentium4
would be better at applications with a "burst" like nature, like video
processing, and worse when it has to make a lot of fast decisions
that can't be known before hand (lots of branches).


For simple integer instructions, Pentium 4 can do 2 per cycle while Athlon can do 3 per cycle. For complex interger / floating point x87 instructions, Pentium 4 can do 1 per cycle while Athlon can still do 3 per cycle.
 

will14

Distinguished
Aug 3, 2006
606
0
19,010
Don't yell at me if I'm wrong but if your Athlon 900 does it in 2 mins 5 seconds.
The Q6700 will probably do it in well under 25 seconds.
 

lordaardvark2

Distinguished
Nov 15, 2005
975
0
18,980
So, you are sure it is because the Athlon does more instructions
in parallel (higher IPC)? Before, I thought it was caused by cache
misses: because the pipeline is longer, a cache miss is more relevant
because more 'work' has to be thrown away. So, I thought, the pentium4
would be better at applications with a "burst" like nature, like video
processing, and worse when it has to make a lot of fast decisions
that can't be known before hand (lots of branches).

If that is the case, that the performance is application dependend,
then my main problem is (not was) that none of the used benchmarks
on THG are about compilation. It's about 3D (graphics), which is pure
bursting data to the video card imho (and a lot of floating point
calculations of course). The same for mp3 encoding: a lot of calculations,
but not about (integer type) branches. Then there are several windows
applications that I have no clue of what they do (I never used windows),
let alone that I can guess what type of load they are for the processor.

If THG had benchmarks "compiling blahblah on linux (64 bit); faster is
better", then I could actually use the benchmarks to make my decision.

Now, I think: well - those numbers are great. But if the performance
of the Intel chips fall back a factor of two when I use them to compile
something, then I don't want to use these results!

This is the main reason that I asked my question: I hope(d) to understand,
on the most detailed, technical level what is causing the difference in
IPC while compiling -- and then I hope to be able to use that knowledge
to understand how to interpret the current benchmarks.

Alternatives are:
1) Someone tells me that 'this or that' benchmark gives
the same ratios (between each cpu) as compilation does (ie, it is the
same type of application, and leads to the same IPC).
2) Someone tells me that the IPC is not (or HARDLY) a function of
the benchmark/application (this can't be true however, because then
each benchmark should show the same winner(?)) and the IPC is
constant.
3) Someone tells me that whatever the difference was that caused
the difference that I am observing between the Athlon 900 and the
P4 does no longer exist, because both architectures now use the same
approach.

Finally, contrary to what some people tell me in this thread, it *is*
important to me to compare the Athlon 900 with the Intel core 2 Extreme
QX6700 (for example), because I currently own the Athlon 900 and I
am thinking about buying a new PC: I want to know how much faster
my new PC will finish the programs that I run. Now I know that it takes
2 minutes and 5 seconds to run a full 'make' on libcwd-0.99.45 after
configuration with --enable-maintainer-mode --disable-pch. I won't
buy a new PC (not worth the money) if the new PC won't be faster
for exactly that than 25 seconds. If will definitely not wait longer
and order it this week if I know it will compile it in 12.5 seconds or less.
At the moment I have NO clue how fast it will be :(

thats a good point, aleric. i have wondered about compiling benchmarks, too, although i don't have much need for them as i don't program much. so now you know the diff between your Athlon and your P4, and that is interesting knowledge to posses, but the thing is, comparing those processors to current processors is kind of apples-to-oranges.

there have been quite a few changes in processor tech since the days of those processors, including multiple cores and beefed-up FSBs. due to this fact, you can't really assume that a current AMD equivalent to the Athlon will beat a current intel equivalent to the P4. although the P4 may do 1 less IPC, that fact isn't very helpful when looking to buy a new computer.

if you are looking to buy right now, i would recommend (without knowing your budget) a dual-core proc from either AMD or intel. your pricerange will depend upon your cash at hand; AMD owns the low end and intel offers some really tempting mids- and- highs.

i don't have much basis for this statement, but i'm positive that a current proc will own those comp times.
 

aleric

Distinguished
Jun 21, 2004
12
0
18,510
This is a funny comparision.... as of now, the C2D simply out classes the K8 ...

I mean that if C2D would take more than 25 seconds, then I'll wait
longer before buying anything. If it will do it in 12.5 seconds then
I don't have to think longer. If it will do it in 18 seconds, then I'll
still have to think about spending the $3000 that is my budget for
(every) new PC, or wait another year to get more for the same money.

The battle for K8 was already lost before I posted this (as I said in my
first post) because it is still 90 nm and therefore uses too much power.
Hmm, or maybe I had said that in another thread... this thread was
a 'spin off' of that thread :oops:

I am very happy to know now, thanks to you, that the 65nm Intel chips
are indeed the fastest chips-- then my decision to buy an intel cpu this
time is fully justified. It still remains to be seen however if the new
machine will be 10 times as fast... but I think I'll go for the QX6700
and just see.

As an (open source) developer, I should have had a 64 bit OS years
ago already :/ It's really time to finally get one. Having four cores will
be fun to play with (developing multi-threaded applications).

Thank you for all the time you took to answer me in detail!
Aleric
 

picard

Distinguished
Apr 9, 2004
214
0
18,690
Does the AMD athelon CPU have difficulty with high resolution graphics for photoshop or games?

Can the AMD chip do calculation for excel spreadsheet or Oracle computation?

How well does AMD handle Office 2003/2007 tasks?
 

angry_ducky

Distinguished
Mar 3, 2006
3,056
0
20,790
you hadnt posted that when i was replying yet 8O damn ,beat to the draw :wink: your explanation is clearer.
Better dig out that old copy of "Mavis Beacon...." Vern...slow typing will bury you in these bloodthirsty Forumz. :tongue: :D

w00t! Mavis Beacon! They forced that crap upon us in like 5th grade. I thought it was crap at the time, and continued pounding away with two fingers. Then, about two years later, I decided to give two-handed typing another shot. The rest is history.