SpecInt/SpecFP - Intel vs AMD

Page 9 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
---- re-reading this post later on----

This post doesnt make much sense.. Im high on cafeine.. I dont see my own point.. but since I typed it, I'll post it.

-----------------------------

Man this is a long thread.. one of the more civilized AMD vs Intel though.. However, sometimes I fail to grasp the point (as in most amd vs intel threads).. Its like reading a newsgroup of football fans.. only most of you guys are much more devoted to your favourite silicon provider than most die hard football fans I know ;-)

Anyway, I am completely indifferent to SpecInt and SpecFP scores.. as I am mostly indifferent to just about any synthetic benchmark. I dont run benchmarks all day, I run applications ! Imagine VIA would come up with a Cyrix XII cpu that smokes the compitition in SPEC benchmarks, but lags in real world apps.. Are we gonna buy a Cyrix XII then ?

Right now, Athlon and P4 are *very* close in most, if not all real world applications. Some apps prefer one cpu, some the other. Use this knowledge to choose the cpu that does your work fastest for a reasonable price. If you need that much CPU power at all, that is.. frankly, most of us dont.

However, to me there are some other points to take into consideration when deciding on a cpu platform apart from speed and cost. Like upgrade paths. If you buy a bright and shiny P4 right now, there is little or no way to upgrade your system without throwing away your motherboard later on.
That is one reason I would not concider a P4 until Northwood comes out. Some people (or especially companies) dont take this as a problem, as they will be upgrading entire PC's only.. I like tossing around components every soo often to always get a good price/performance ratio. To me, currently, this ideal price/performance ratio is a Duron 7-8-900. I dont need anything faster currently (but then thats me, I play games, I dont render 3D max all day).
The nice thing about the AMD platform, is that I can double my performance using the same components (mb, ram, video) by just swapping the cpu for a 1.5+ palomino when it comes out. Quite likely, I"ll be able to stick with this setup upto a 2 Ghz T-bird. No that is some nice upgrade path. Almost as good as getting my P200 to a K6 500 🙂

intel can not give me this today. They could in the past (BX going from 350 to more or less 1 ghz). Now, P3 platforms hit the ceiling at 1 Ghz. Buying a P3-8xx is therefore not an option to me.. (even regardless of the cost compared to an Athlon)
The P4 has a relative small range from 1.3 (still quite expensive with 800 Mhz RDRAM) to 1.5, maybe 1.7 in the near future.

When Northwood comes out, and they sell some "low-end" Northwood cpu for a reasonable price, I might reconcider and actually start advising my friends to go for a "cheap" northwood solution, and upgrade later as needed.

All other arguments pro and contra intel / amd are a question of preferences. How important is thermal protection to you ? How often are you gonna remove the heatsink yourself, and how carefull are you ? Are you an overclocker or not ? Is SMP something your apps will benefite from ?

We all have different opinions about those, and base our purchasing on them. Thats my bottom line. Over the past I've bought both AMD and Intel systems.. from a 286 8 Mhz to a Duron @933. I have yet to regret one purchase.. (hmm.. okay.. maybe I was disapointed when I went from a 5x86-133 (486 style chip) to an Pentium 133...difference was minimal, cost relatively high).
Since Im already boring the crap out of you.. let me list my cpu history:

286 8 Mhz (intel)
486 33 Mhz (intel)
486 66 Mhz (amd)
5x86 133 (amd)
Pentium 133 (intel)
P2-266 (intel)
K6-500 (amd) (second PC)
P3-500 (intel)
Duron 600@933 (amd)
Celeron 500 laptop (intel)

Now, am I a AMD advocate or not ? You tell me.
 
"I can double my performance using the same components (mb, ram, video) by just swapping the cpu for a 1.5+ palomino when it comes out"

Not likely. You'll more than likely need a new motherboard, especially if you plan on using DDR ram. Simply upping yoru CPU to twice the old speed will not give you twice the performance.

"The P4 has a relative small range from 1.3 (still quite expensive with 800 Mhz RDRAM) to 1.5, maybe 1.7 in the near future."

1.7GHz in 5 days. 2.0 GHz in about 2 months.



-- The center of your digital world --
 
You appear to have misunderstood what I was saying. I was NOT saying that the system bus cannot keep up with memory or memory with the system bus. I said that the *CPU* can process the FP instructions faster than they can be fetched from memory.

I know that memory and system bus are fast on P4, just not fast enough for FP intensive apps. When it comes down to it FP intensive apps typically improve in performance with more bandwidth.

The reason I mentioned this at all was that there was an implication that the P4 FP unit is somehow better than that in the Athlon. The relative merits of each can be discussed elsewhere but the basic point still stands and that the performance increase is a lot to do with bandwidth.

This is not a problem or a big issue I was just providing a different viewpoint on a statement which I consider wrong.

L
 
Thank you for reminding me of the new-fangled way things are done. When I developed many years ago we all still used static linking :)

To create a P4 specific version of an app you need a compiler and the only one which is P4 optimised is the Intel one. Most apps are created and optimised using the MS compilers which don't currently have P4 optimisations and (based on support for past processors) probably won't have for several years.

Also while you can do the thing with multiple CPU support most companies don't bother (though yours may) and just target the mainstream by compiling once for say 386 optimisations. We could get into a debate about which is the way things are done by "most" companies but I have done some research on this and the vast majority (about 6 to 1) don't bother with optimising for specific CPUs. Many "high end" apps (graphics/video/audio stuff) may well benefit from the optimisation enough to do it.

As far as DirectX etc is concerned, that it great news if we have support in the drivers. Do you know which companies support the P4 in their drivers and what sort of performance gains they get? Let's face it support is all well and good but only if you actually get any sort of benefit out of it.

L
 
<<the new MS compilers will use P4 arranged instructions and support them...
them are comming along with Windows XP which is fully supported P4 code..>>

That's nice to hear


<<you are simply incorrect when you say Palimino will have a better structure and branch prediction unit..

Palimino is sole a reduced die size and voltage version of
the Athlon so as not to run so hot and burn up as they do..
there Palimino is merely reduced in size and voltage to as to allow high MHZ which the current athlon cannot do..
no major hardware changes are planned for it..>>

AMD have said that there WILL be a better BPU and hardware prefetcher. Why do you believe otherwise.

<<the P4 includes a hardware prefether, that can move instructions in and out of registers without disturbing cache, and way before it actually needs them !!>>

Actually this is complete nonsense. The prefetcher CANNOT fetch into a register such a system would be unworkable. I can draw pretty diagrams if it would help you understand.

<<the Athlon would and does not dominate at the same clock speed, if you look at tom's tests involving an overclocked 1.6 athlon, some of them show the P4 faster the athlon at slower clock speed,
even though his tests are very flawed and he should know better, and not use dd3d 7 which has np P4 cade in it,
and he used many benchmarks that are a generation old like
SANDRA, and 3dMARK both of whihc contain no P4 code,
and he used games like unreal and toehrs whoses enginers are a year old and have no p4 code, or have timer limits in them unlike quake 3 which scales limitlessly..>>

I don't know how you say something which is KNOWN to be completely false. The current Athlons already dominate benchmarks when compared to P4 systems running 200MHz+ faster. Just look at the various comparisons out there. Yes you can create a set of benches in which the P4 will win everything but you can show anything if you choose the right conditions.

<<so since I have run tests on P4 versions on these programs
I can tell you for a fact that P4 code makes a huge difference.. just as all you athlon nuts screamed when benchamrks showed P3 wailing on Athlon before 3d NOW was in
wide use..>>

Why do you label me an "Athlon nut"? I own a PIII system and have no intention of buying an AMD system. I am being reasonable and asking questions and wanting to discuss things not labelling people.

<<fact is most credible benchamrks that are crossplatform
and apps that are compiled for P4 are much faster die to the superior architecture of the P4, its 400 mhz bus,
and dual channel rambus and its 3.2 GPS bandwidth..>>

Yes with recompilation a lot can be achieved with recompilation and this point has been made before. People don't upgrade all of their software just because they buy a new CPU. BTW can you give me more information and stats on what you mean by "most credible benchmarks".

<<no matter what you say in you opinion, you cannot change computer architecture facts like the ones above which are factually superior in bandwidth and speed
its like saying a V8 is slower than a V4 given equal circumstances
the fact you do not like or have a P4 does not make it true that is it not a better CPU than Athlon, or P3..
fact is I see and test dozens of machines a week,
and I have compared both and I am telling you P4 is the fastest thing I have even seen PERIOD>>

I am well aware that the P4 is a good architecture for some apps. Can you point out where I make a blanket statement that it is worse than the Athlon or PIII? I am pointing out issues which you take as an attack on a CPU to which you have a deep emotional attachment. I cannot seriously carry on a discussion at such a childish level. Goodbye

L
 
"When I developed many years ago we all still used static linking"

This is still static linking. You just target a certain cpu when compiling a .cpp to a .obj. It doesn't matter which linker is used for the link step. There aren't many optimizations in that step.

"Do you know which companies support the P4 in their drivers and what sort of performance gains they get?"

The DirectX HAL and HEL in version 7.0b and later are optimized for the P4 and use SSE2. These are included on all machines regardless of your drivers. I don't know which manufacturers use SSE2 in their drivers yet.

-Raystonn

-- The center of your digital world --
 
"Absolutely true, for example P4!"

For example, _any_ CPU. The CPU is one component. You need your CPU _and_ memory to be twice as fast, and for both to be able to fully use this extra speed, for processing to actually end up twice as fast. For total system performance you also need a fast hard drive and video card, as well as some other components.

-Raystonn

-- The center of your digital world --
 
"Palimino is sole a reduced die size and voltage version of
the Athlon so as not to run so hot and burn up as they do..
there Palimino is merely reduced in size and voltage to as to allow high MHZ which the current athlon cannot do..
no major hardware changes are planned for it.."

I think it is you that needs to read up a bit more.
To call yourself unbiased has got to be the joke of the decade. You sell strictly Intel systems. You have a monetary investment to have people believe what you spew is correct. Asking you anything about an AMD system would be akin to asking a Chevy dealer about a Ford product. Furthermore, this thread was actually being conducted in cival manner until you came forth with the etiquete of bull mouse in heat. You are quick to point out others errors ( in which I beleive many have made) yet fail to acknowledge any of your own such as the one I opened this thread with.


A little bit of knowledge is a dangerous thing!<P ID="edit"><FONT SIZE=-1><EM>Edited by Ncogneto on 04/18/01 10:13 PM.</EM></FONT></P>
 
yet another quote of yours wich is incorrect:

"Athlon,s SMP has still one lousy channel which both CPU's must bottleneck on...."


Symmetric Multiprocessing (SMP) Systems based on AMD's Athlons will be a quantum leap ahead for servers based on the x86 architecture. As the Athlon architecture is heavily based on the Alpha, its switch-like bus design borrows strongly from its Alpha lineage as well. Unlike the GTL+ bus used by the P6 where bandwidth is shared among the processors, the Athlon uses a point-to-point design to enable full bandwidth for each processor. For a dual-processing Athlon system, this point-to-point bus can channel 4.2 GB/s.

The AMD 760 MP chipset is a DDR SDRAM solution that can support two 266MHz FSB Athlons. The chipset has advanced buffering to enable maximum transaction concurrency. The 760 MP also uses a sophisticated cache concurrency protocol that AMD named "MOESI." We will discuss this protocol in a later article.



A little bit of knowledge is a dangerous thing!<P ID="edit"><FONT SIZE=-1><EM>Edited by Ncogneto on 04/19/01 01:19 AM.</EM></FONT></P>
 
from THG--

"This test has made one thing very clear. Pentium 4 boards with Intel 850 are very reliable and ready to hit the mass market - once prices become more attractive. Unlike early products with VIA chipsets, the young Intel 850 chipset is precocious."

<A HREF="http://www.tomshardware.com/mainboard/01q1/010321/i850-17.html" target="_new">http://www.tomshardware.com/mainboard/01q1/010321/i850-17.html</A>



i had a drink the other day... opinions were like kittens i was givin' away
 
post from j at aceshardware technical--

"My Athlon optimized STREAM benchmark was just tested on the
760MP system. I don't post the results, because I did not ask
for the permission yet.
Running one instance of the benchmark shows results that are
partly a little bit lower, partly a little bit higher than on
a AMD 760 chipset.
With two instances the added results are not much better than
the results with one CPU except one: The STREAM v2 subroutine DSUM, which
only reads data from memory, nearly doubles the performance to
~1600 MB/s. Really interesting! The 760MP chipset seems to be able
to handle two independ "memory reading streams" better than one
stream (I have to test if this is also possible with the 760 chipset).
But this reading capability seems to be disturbed by memory writes, otherwise
the DADD and DTRIAD results would be much better.
In general, the 760MP bandwidth is too low to my opinion. Also disappointing,
that the tested board includes only on channel of DDR RAM. This means,
that the point-to-point topology of a dual Athlon system can not show its
real power.
I think the weakness of 760MP and 760 chipsets is the lack of 'bank
interleave' or possibly the missing ability to switch between reading and
writing quickly.
One can really hope that new chipset or BIOS revisions will
include this features.

j
p.s. I do not say that the 760MP board is bad. I only say that memory

bandwidth is weak in the moment."

<A HREF="http://www.aceshardware.com/cgi-bin/ace/tech.pl?read=13594" target="_new">http://www.aceshardware.com/cgi-bin/ace/tech.pl?read=13594</A>

i had a drink the other day... opinions were like kittens i was givin' away
 
Escuse me but I don't think I ever recomended a via chipset did I? :) I will admit however, in my opinion AMD's achilles heal is the chipsets that are available for it. Although there own chipset performs well, I don't think you are going to see it around in the quantity it deserves. I am hoping that nvidias chipset that they are developing will change things for the better.

A little bit of knowledge is a dangerous thing!
 
I am not quite sure what to make of this. Interesting, but how on earth did he get his hands on a 760 MP mobo. At best its a beta.

A little bit of knowledge is a dangerous thing!
 
nvidia's chipset looks pretty good so far... not sure what your post is refering to though. i was responding to the "unstableness" of the p4. tom seems to think that it is quite stable even in its infancy albeit too expensive and not worth it.

about the 760MP... true certainly a beta.

i had a drink the other day... opinions were like kittens i was givin' away
 
Oh, I see, you misinterpretted the authors use of the term "stableness". It was used in reference to platfrom stability in the sense the current p4 platform will be short lived an non upgradeble. Don't think he meant to infer that it operated unstable.

btw, since you mangaed to locate that link do you have any on SMP p4 machines?

A little bit of knowledge is a dangerous thing!
 
ah... i see. no haven't seen any smp p4 links yet. i'll share them as i find them. even if they are disappointing but not bad 😉

i had a drink the other day... opinions were like kittens i was givin' away
 
Hey , I'm sorry to disturb you but as I see you work
with clusters and a developer, well I have thise question
What kind of programs actually need such power? (I'm developer too, looking for some serious work)
try elaborating as much as posible.

Thank you.

<b>-----------------------</b>
-<font color=red><b>R.K.</b></font color=red>
 
NO I am quite correct,
you just misread what I said..
I said nothign about the ALpha bus in Athlon, whihc incidently recently they have gone farther away from
.. and anyone who builds SMP boxes knows that Alpha boxes
do no scale well past 4, even though they can have more the diminshing returns are abismal..

the Xeon holsa ll world speed records for TPS in 4-16 speed CPU's..

WHAT I SAY SAYING was even though the Athlon uses some Alpha technology, is still has a [-peep-] chipset that is single channel to memory, and not concurrent,
whereas P4 and rambus is,
the 760 single cpu has more bandwidth than alpha servers
to memory and certainly than athlon SMP,
which is why I can show you scores in memtach, and Stream
that have a single P4 kicking the [-peep-] out of a dual ALpha 667 in memory and chipset throughput..

INTEL SMP is the standard, they developed modern APIC SMP
when Unix has crap PArallel, so do not even go there,
INTEL has had SMP and I have been designing and building SMP
boxes up to 8 CPU's for 7 years ?
know a little about it !

FOSTER, with the 770 Chipset, will be a DUAL and QUAD NORTHWOOD P4 with 2 dual channel memory paths to each CPU
giving each 3.2 GPS, and FOSTER will uses revolutionary
MULTITHREADED CPU design, to thread apps in hardware and memory !

It will have Itanium technology to some degree too..

The A in AMD certainly will not stand for ADAVANCED anymore

CAMERON

CYBERIMAGE
<A HREF="http://www.4CyberImage.com " target="_new">http://www.4CyberImage.com </A>
Ultra High Performance Computers-
 
>Hey , I'm sorry to disturb you but as I see you work
>with clusters and a developer, well I have thise question
>What kind of programs actually need such power? (I'm
>developer too, looking for some serious work)
>try elaborating as much as posible.

Computational clusters tend to be good for problems that can be parallelized in a relatively coarse way because the communications speed is pretty poor relative to SMP solutions. What I mean by that is that the chunks of the program that you split out to execute in parallel are relatively independent. They aren't tightly coupled, requiring lots of inter-process communications.

Some of the more well known problems that cluster are being used for:

<A HREF="http://www.seti.org/science/setiathome.html" target="_new">SETI</A>: Ok, this isn't <i>really</i> a cluster. But it's an example of a very coarse grain problem. The work units are completely independent.

Computational Biology has gotten into clusters pretty heavily. Some examples are <A HREF="http://www.tjhsst.edu/~shardest/poster/" target="_new">protein folding</A>, and <A HREF="http://www.zdnet.com/intweek/stories/news/0,4164,2667453,00.html" target="_new">genome research</A>.

Also <A HREF="http://www.gcn.com/vol1_no1/daily-updates/1816-1.html" target="_new">Weather</A> forcasting, <A HREF="http://www.computerworld.com/cwi/story/0,1199,NAV47_STO55133,00.html" target="_new">oil exploration</A> and image rendering.

The government has some very large clusters at some of the US national labs. I believe they are primarily used for simulating nuclear weapons.

another somewhat pedestrian application of clusters is for montecarlo analysis. Say you have an application that takes several hours or more to run, and you need to get statistical information on the results with a large variety of inputs. There is software available that can help automate the process of building the input decks, starting & monitoring the application out on the cluster, and then collecting the results.


BTW: These are certainly not the definitive links for some of these subjects. Merely what came up in google.


In theory, there is no difference between theory and practice.
In practice, there is.<P ID="edit"><FONT SIZE=-1><EM>Edited by ergeorge on 04/19/01 04:54 PM.</EM></FONT></P>
 
Getting back to the original post here about the spec benchmarks. Even if the numbers are correct they are absolutely irrelevent. How much proof do you need? Read any reviews comparing the Athlon 1.2 or 1.3 against a P4 and watch the P4 get crushed in the real world. The P4 loses in pretty much everything except webmark (yay!) and q3 (we knew that) It's almost pathetic. Links?

http://www4.tomshardware.com/cpu/01q1/010322/index.html
http://www.anandtech.com/showdoc.html?i=1441

Even Sharkey gets some of the action!

http://www.sharkeyextreme.com/hardware/reviews/cpu/thunderbird_1-33ghz/

So tell me again how P4s amazing bandwith capacity is helping?

"I have fried an Athlon myself, but I'll never admit to it."
 
hehe.. NP complete (or NP-Hard?) problems.. gotta love it :)

>The outside one is basically a Job Shop sheduling problem with up to 32 "machines".

Egad man, I had enough trouble with activity scheduling on 2 "machines". But correct me if I'm wrong, but isn't activity scheduling in P? I mean, after all, its not like its going from 2SAT to nSAT right?

>Now, once preliminary assignments are made to those machines, they have to figure out what roughly amounts to a traveling salesman problem with up to 200 "cities".

I'd love to hear more of this.. are you reducing your problem to an instance of vertex cover? Seems to me like that would be the most likely reduction from TSP.

>Now, while you're figuring out that TSP, you have to consider whether your initial Job Shop assignment was the best that it could be.

Well, if I'm correct in my hypothesis that activity scheduling is in P and can be resolved by a greedy algorithm resulting in a optimal solution, then you really don't need to "tweak" as you go right?

Cheers!
 
You're right, of course. The proper terminology would be <A HREF="http://hissa.nist.gov/dads/HTML/nphard.html" target="_new">NP-Hard</A>. My formal training in this field is a little lacking.

Job Shop Scheduling is NP-Hard. Here's one <A HREF="http://epubs.siam.org/sam-bin/dbq/article/32610" target="_new">link</A>. We had originally been approaching the problem as a Job Shop problem, but I think I have a better abstraction:

Imagine you have up to 32 salesman. They must visit up to 9000 cities a day, most of them at least 2 or 3 times a day. The timing of the visits has various constraints on it, such as you can't make a second visit within 2 hours of the last visit, and maybe visits have to occur on alternating halves of the hour.

Any of the salesman is allowed to make any visit, but the times that a visit is possible by any given salesman is highly constrained.

Now 20000+ jobs scheduled over 24 hours to 32 machines is completely absurd. So I split the day into manegable chunks, currently 15 minutes. I can determine which cities it is possible to visit in that period. About half of these cities are only available to 1 salesman, about a third can be visited by either of two salesman, and the rest can be visited by any of 3 or more salesman.

Some simple heuristics are used to make the initial assignments of the cities that can be visited by multiple salesman. Then each salesman goes off to find its best possible route through its assignment.

This part is more then a simple TSP problem though, because all of the cities are moving. So, a good edge at the beginning of the scheduling window, say c->f->b, may be completely lousy at the end of the window. Additionally, some of the cities may only be available during part(s) of the window, so c->f->b may be impossible later on. So, unlike a classic TSP, absolute order is as important as relative order. The ability to maintain relative orders, but vary absolute ordering is one thing that genetic algorithms have had a hard time dealing with in solving classic TSP, so weakening that requirement starts to make them more atttractive. Also, each city has a priority, and each has a different length of time that it must be visited. This stuff completely screws up any of the TSP solution methods I've seen.

Now, this salesman has been working on his schedule for a while and finds that there are some very high priority targets that he just can't get without screwing up his entire schedule ... say he's been working New England, and you tell him to visit San Diego in the middle of the day. So the salesman assignments need to be re-examined during the run. Maybe this guy can trade his San Diego visit for a Bangor one. But now he needs to work this new assignment into his schedule.

I'm working each of the individual TSP problems with a genetic algorithm (GA). I've had some luck solving a similar, simpler problem with this in the past. They (GAs) tend to very very tolerant of discontinuities and changes in the solution space, and this is full of them! Each salesman (GA) has its own CPU in the current implementation. This limits the useable cluster size to # of salesman + 1 cpu for the master process and one for a process that arbitrates the migration (which really doesn't need it's own cpu). We may eventually break out the evaluation of the GA population to multiple CPUs as well. The bit with transferring cities between salesman is basically handled like migration in an island model GA.

It seems to be working pretty well, but there really aren't any results to compare it to. As long as the salesman are always busy and the routes they come up with look sane. There is a more objective way to evaluate the final schedules, but its a big, slow, complex program in its own right and we haven't gotten to it yet.

We don't currently have the processing power to really attack the whole problem. That 15 minute scheduling window is also something that is really bugging me also. It's an artificial constraint that we've imposed due to computing resources and I'm sure its hurting the solution performance.

I'm sure everybodies eyes are glazed over now...I hope nobody has fallen asleep drooling into their keyboard 🙂

Actually, it's very good to step back and try to explain a problem like this in abstract terms to an audience that knows nothing about the problem. I've gotten some fresh perspectives and new ideas. Thanks!


In theory, there is no difference between theory and practice.
In practice, there is.