Why do P4's kill Athlon 64 in media encoding?

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
You are the one that brought up 20%. I always stated 10% and thought it understood that the increase was in respect to that portion of the total where it applied. [funny] a 10% faster chip that performs 20% faster[/funny]
For example. The P4s use a 128 bit fsb. Of this, @ 133 mhz, 100 bits are used bi-directionally by memory. At 133mhz this is 13300000000 bits per second, each way. The other 3724000000 bits are used for other north bridge functions (roughly, on average, just a guestimate at best).
At 200 mhz, the total bits/sec is 26600000000, with the memory having access to all but the 3724000000. While the fsb has increased by only 50%, the available memory bandwidth has increased by 72%.
The latency, on the other hand, is only 50% faster.
Does that start to make sense?
 
>Does that start to make sense?

I'm afraid not :)

>While the fsb has increased by only 50%, the available memory
> bandwidth has increased by 72%.

No it hasn't. You are assuming some arbitrary and "useless" ammount of FSB traffic, that would remain constant even if memory and application performance increases.. thats just nonsense. Oh, and you're numbers are quite wrong too (as well as irrelevant), and you can't just use 100 bits out of 133..and.. oh, well, whatever..

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
Yes, using 100 bits is arbitrary and
(roughly, on average, just a guestimate at best).
and yes, the north bridge traffic will scale with performance,(I didn't include DDR in there either) but in the end, you will still end up with more available memory band width, which means that the odd tag will arrive a little sooner, negating a little latency sometimes, or returning the memory call a cycle sooner other times.
Oh well, I dont need to waste any more of your time.
 
Good discussion, post.

Its not as easy as that. In fact, in x87 floating point, the AMD chips perform considerably better. Still its true generally netburst is better at video encoding, mostly because of its SSE2 performance.
Can you show me a bench that shows the x87 floating point of the AMD chips being better? From the Whetstone benches I referenced earlier I was concluding the opposite, that the P4 dominated in floating point.

Everyone (especially THG) uses DivX to measure encoding speed, while Xvid is the more popular codec (its free, and generally better). Coincidence ?
Take a look at the second page of the app benches I referenced. It includes Xvid and Intel does very well:
<A HREF="http://www.tomshardware.com/cpu/20041221/cpu_charts-19.html" target="_new">http://www.tomshardware.com/cpu/20041221/cpu_charts-19.html</A>

I appreciate your point that different apps and even encoding different scenes can produce different results. I looked at the link you referenced. I would hate to believe that all these benches are rigged by Intel or AMD. Somebody has to do some objective testing.

Maybe I'm old school but I tend to still put some faith in the synthetic benches for that very reason. Yes, no one runs a synthetic app. But like in this case I see the video performance benches clearly weighted in Intel's favor and I am trying to understand why. When I look at the synthetic benches I think it may give the answer. Better floating point (Whetstone) and multimedia (SSE2, SSE3?) performance. I think this discussion has convinced me its not memory bandwidth like I first thought.

All this is telling me that Intel and AMD are a lot closer in performance than I thought and makes a purchasing decision even tougher.
 
>Can you show me a bench that shows the x87 floating point of
>the AMD chips being better?

Euh.. no, but its generally assumed/known. P4 has a relatively weak x87 unit, but is good at SSE2.

> I would hate to believe that all these benches are rigged
>by Intel or AMD

Saying that all of them are "rigged", is probably an overstatement, but I do not doubt for a second both companies are fully aware of the "value" of winning benchmarks, and will use their influence to ensure the best possible result. Some tactics include software optimization assistance and compiler optimization (especially intel), or heavily influencing benchmark development (think Bapco/Intel) but read this and make up your own mind: <A HREF="http://www.theinquirer.net/?article=22332" target="_new">http://www.theinquirer.net/?article=22332</A>
I take it with a grain of salt, but OTOH, these companies would be *stupid* not to give some incentives for better reviews. Its dirt cheap publicity, and AFAIK not even illegal. And it can be done subtil enough so that you won't notice it, just focus the article benchmarks a bit more on aspects a certain CPU does better in, make sure the videocard is the bottleneck if it sucks for gaming, select exactly as many threads as the cpu can handle in hardware, not one more, not one less when the other cpu can handle less, toss in an overclocked result for one cpu, but not the other "just for reference", make small errors now and then in the charts, obsure some serious drawbacks (for instance by hiding the CPU powerdraw in the overall powerdraw, if possible including 19" monitors..) ignore some facts when they are inconvenient (P4 running above maximum allowed temperature per spec sheet), and spin the others just enough to remain at least superficially objective.. then draw absurd conclusions that simply don't much the data.. sounds familiar ? It should, if you read this site.

>Somebody has to do some objective testing.

Yeah, but its hard to be objective even if you want to be. Its pretty much a given you will find apps/workloads that will benefit processor A over B, so which ones do you use ? For gaming, there is *some* value in testing with high resolutions and high quality settings as THG does (showing those games are GPU limited, more than CPU), but OTOH would it not be fair to use ultra high end SLI cards then ? And does low res testing not show you how your cpu might perform on next years games or with a next gen videocard ? Tough calls.


>Maybe I'm old school but I tend to still put some faith in
>the synthetic benches for that very reason. Yes, no one runs
> a synthetic app. But like in this case I see the video
>performance benches clearly weighted in Intel's favor and I
>am trying to understand why. When I look at the synthetic
>benches I think it may give the answer. Better floating
>point (Whetstone) and multimedia (SSE2, SSE3?) performance.
>I think this discussion has convinced me its not memory
>bandwidth like I first though

Clearly not, as A64 and P4 have similar bandwith, especially when you compare with 875 for the P4. A64 may well have more effective bandwith, since it doesn't have to share the FSB with the memory controller. No, its most likely SSE2 performance and clockspeed. As I said, and although i could be wrong, I think P4 and A64 have pretty much the same theoretical SSE2 peak performance per clock


= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
P4Man,
I read the link on the Inquirer. Pretty candid stuff. He's probably right too.

No, its most likely SSE2 performance and clockspeed. As I said, and although i could be wrong, I think P4 and A64 have pretty much the same theoretical SSE2 peak performance per clock
I think you hit the nail on the head! Everyone talks about SSE2 performance being the same as if the AMD and Intel CPU's should produce the same performance numbers. Yet it makes more sense if you think about it the way you put it, they have the same SSE2 performance per clock cycle. Makes sense when you think about it. SSE2 is a streaming flow of instructions. AMD implements the same instructions as Intel for compatibility. So if you have the same flow of instructions and the same way of executing them then the CPU with the higher clock cycle wins. AMD and Intel have different archetectures in other areas which result in differing amounts of work done per clock cycle but for compatibility reasons I doubt there is any difference in architecture and execuction with these streaming instructions. Its all about Ghz in this area. Correct?
 
SSE2 is not the only optimization that Intel has taught certain software makers to use.
When it comes to encoding, dual cpu systems have been in use for a while. It was not too difficult for Intel to help rewrite the software, so that it could take advantage of hyperthreading.
Unfortunately not all encoding progs are created by companies who have Intel on thier side. Much of todays encoding is done via shareware or freeware based programs.
Things like DVDshrink have not had that Intel assistance. They run slower on Intel systems.
Even with floptimized programs, it does require that the audio, or video to be encoded, is able to line up according to how the software wants. Any time this fails, the Intel system will be mired in it's poor IPC.
Most benchmarks we see, use a file that is well suited to the Intel optimizations. When real world situations occur, the P4s dont do quite so well.
 
instructions per clock (IPC)

I was under the impression that video encoding speed is codec dependent.

For example, in XVID the Athlon64 kills Pentium4.
but in DIVX and WMV9, the Pentium4 pulls it off against the Athlon64.
 
>Its all about Ghz in this area. Correct?

Not necessarely.. its all in the SSE2 performance :d

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
i have a feeling the SSE, SSE 2 and SSE 3 instructions have alot to do with it - yes they assist (or are designed) for multimedia and video processing, and yes AMD does use it, BUT WHO ACTUALLY DESIGNED SSE 1, 2 and 3 - INTEL - im guessing they actually design the Instructions for the P4 core (or netburst) and by doing so gives it a boost where as AMD "follows" the idea and adds them for support but not the boost? like internal shortcuts because of the long/deep pipelines? Im just speculating, and this is Intel were talking about. Also if we look at AMD, there 3DNow! boosted there CPU's since the K6-2.

Any input here?
 
Take a look a this link:
<A HREF="http://www.tomshardware.com/cpu/20041221/cpu_charts-19.html" target="_new">http://www.tomshardware.com/cpu/20041221/cpu_charts-19.html</A>

where the benchmark was Xvid. Results look pretty much the same as for Divx or WMV with Intel (blue) leading many of the AMD64(green). Unless the benchmarks themselves or the data files used are somehow set up to give Intel an advantage (which someone in this thread mentioned) there seems to be a very consistent story here:
Intel is better at video processing. Why? That's what we are discussing here but I am beginning to think its a combination of SSE2,3 and raw Ghz.
 
Please do compare FSB 1066 with DDR-2 533 using medioce timings, with FSB 800 and low latency DDR-1 400 so both memory subsytems would have comparable absolute latency (in ns). I doubt the performance difference will be anything else than neglectable.
If you will tell me how to set up the comparison, I have both of the systems you described in front of me for the next couple of days. Ill be happy to give it a shot.

ASUS P5WD2 Premium
Intel 3.73 EE @ 5.6Ghz
XMS2 DDR2 @ 1180Mhz

<A HREF="http://valid.x86-secret.com/records.php?PHPSESSID=792e8f49d5d9b8a4d1ad6f40ca029756" target="_new">#2 CPUZ</A>
SuperPI 25secs
 
From Tom's? How about <A HREF="http://www.tomshardware.com/cpu/20050509/cual_core_athlon-15.html" target="_new">this</A>
Dont believe everything you see with encoding benchmarks. Most of them are based on Intel optimizations, and run "on the rails" so to speak. So long as all the "work" is done through optimization, Intel does well. In real world application, all the packets dont generally line up so neatly.
Aside from that, there is the "heat" question, the relative cost question, the throttling, and the hardware problems.