Why do P4's kill Athlon 64 in media encoding?

lbecque

Distinguished
Dec 17, 2003
19
0
18,510
Does anyone know why P4's do so much better than Athlon 64's in the media encoding tests??
Is it SSE3 and Hyperthreading???

Take a look at the MPEG2 tests in:
http://www.sharkyextreme.com/hardware/cpu/article.php/3261_3514901__6

The P4's are generally 40% faster than the Athlons.
Even the dual core A64 X2 4800+ is bested by the dual core P4 EE 840 which is only running at 3.2Ghz.

What really stood out for me in that test was the P D 820 dual core which at $260 outruns both Intel and AMD processors costing $800-1000. Looks like a good value for those like me doing video editing.
 
Clock speed. Plain and simple: raw clock speed.

Video encoding is a streaming process where the netburst architecture of the P4 really shines. Most other uses are branchy and the cpu has to stop and start and keep changing direction (this is an analogy to the technical reality), while encoding lets it open up and run full out in a straight line. SSE2/3/SIMD/etc., helps, but the primary reason is raw clock speed.

Video editing on the other hand, is branchy and causes the CPU to be much less efficient. The A64 (and the P-M, PIII, Athlon XP, etc.) is a much more nimble cpu in this context, but just doesn't have the top speed that the P4 has when going in a straight line.

I guess a car analogy fits pretty well - P4 = muscle car: Lots of top speed, but has trouble in the corners. A64 = rice rocket: Decent top speed (but generally is beat by the muscle car at the end of a looooong straightaway), but corners like it was on rails.

For video EDITING, the A64 provides a little better performance (at a given price point), but when you are ENCODING, the P4 can shine. If I was to give you my opinion (I can't resist :lol: ) if you want a single rig to do both, get the P4. But if you have a separate rig for encoding so you can encode and edit at the same time (on different PCs), then get a P4 for encoding, and an A64 for editing.

Mike.


<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
 
The differences are very close now...if you look at the more recent tests.
The reason that Intel has an advantage here is that Intel has a much higher clockspeed than AMD.
But fishmahn put it better...🙂
Cheers,
Charles<P ID="edit"><FONT SIZE=-1><EM>Edited by viditor on 06/24/05 03:05 PM.</EM></FONT></P>
 
What makes you think video editing is more branche sensitive ? Not saying you are wrong or right, just wondering.. gut feeling would put video editing in the same type of apps as encoding (it is actually encoding so... ?)

I do agree though, that media encoding is one of those tasks where clockspeed matters, and most of the disadvantages of long pipeline are mitigated. HOwever, as someone else pointed out, differences really aren't that big, and depending upon what settings you use, what codec, what app, one or the other cpu is faster. I guess it has as much to do what they optimized the codec for as anything else.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
I'm not 100% certain I'm right, but here's my reasoning.

Editing involves short stretches of streaming (play to 'here', perform this effect, etc.) but most of what you do in editing isn't encoding, it's viewing, backing up, scrolling forward - skipping every Nth frame and such, find the right spot, now grab this other piece from somewhere else (either in the same file or somewhere else), perform these effects, etc.

Except for short stretches where the the info is streamed serially, and the CPU can stretch its legs, most of editing is a lot like photo editing, or using an excel/word/powerpoint type app - pull down this menu (or activate this floating toolbar, etc.), activate this feature/function in the program, apply it to a short stretch of streamed data, now stop, back up, stream a few secs, stop, scroll forward, pick a different function, insert this short clip, etc. Also, while editing you aren't reading the whole file - you're usually encoding/decoding in low-res, not full resolution, so the raw amount of data is much smaller.

Well, that's where I get that idea. Also, and I don't have links off the top of my head (or even remember where - but likely Anand/similar sites), but I remember seeing benchmarks/reviews where a comparable A64 (and AXP in its day) was slightly faster than the P4 in many/most editing simulations. Not as much faster as in gaming or even business apps, but still noticeably faster. Hmm, could have been within the error margin of the tool, but I don't know.

If I have time tonight (or if I don't forget tomorrow as tonight is my wife & I's 'date night' to help keep us close) I'll see if I can track a couple benches down.

Good point on different codecs. I should have brought that up. I think (without any real proof - just my 'feel' for systems) that if you were to take the same app/codec, with 2 different compiles and optimizations, one tuned for P4, one for A64, the P4 will still outperform, though not by the 40% that some benches report. I think it'd be more like 10% (WAG). IMO, those benchies that show the P4 totally dusting A64 are run with binaries tuned for P4, and maybe not consiously, but detuned for A64, making it a quite unfair, though understandable comparison (tune for the majority product).

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
 
>Editing involves short stretches of streaming (play to
>'here', perform this effect, etc.) but most of what you do in
> editing isn't encoding, it's viewing, backing up, scrolling
>forward - skipping every Nth frame and such, find the right
>spot, now grab this other piece from somewhere else (either
>in the same file or somewhere else), perform these effects,
>etc

Hmm.. maybe we should define what we are discussing here; if editing in your vocubalary is the act of cutting, applying effect, etc,.. then i could agree, but I must immediately add that this doesn't seem like a very CPU intensive activity to me. Bottlenecks would rather include Harddisk, I/O,..

If you mean previewing effects, and mostly after the editing, renderin the movie, AFAIK, that is just like encoding/recoding.

>Good point on different codecs. I should have brought that
>up. I think (without any real proof - just my 'feel' for
>systems) that if you were to take the same app/codec, with 2
> different compiles and optimizations, one tuned for P4, one
> for A64, the P4 will still outperform, though not by the
>40% that some benches report. I think it'd be more like 10%
>(WAG)

My WAG is that most codecs have different codepaths for different ISA's, since these things are really small, and hugely cpu intensive. Anything else would be truly dumb. That said, the performance gains you can get from proper optimization (especially hand coding critical paths in asm), the potential gain/loss should be expressed in orders of magnitude, rather than percentage.

>(tune for the majority product)

A single binary can contain codepaths for different cpu's. Of course, its quite feasable more effort has been spent on certain codepaths than others, furthermore its even quite thinkable intel themselves handcode certain parts of popular apps (photoshop filters) or codecs to ensure they perform well on popular benchmarks (and as long as you end up using that exact same app, its even quite 'fair' as you will get that very speedup as a consumer).

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
Editing comes in many forms...
For cuts only editing (cut and paste), the 2 are quite close but Intel has the edge because of their clockspeed.
For transitions and effects (wipes, dissolves, etc...), this involves rendering. While Intel used to have an edge there as well, this has changed. Now it's AMD with the slight edge, and with multi-threaded renders (this will depend on which software you're using) the X2 has a big edge.

Cheers,
Charles
 
The P4 makes better use of RAM than A64's. In fact, the P4's performance is centered around the high bandwidth memory bus. These are simple things...

Now, my experience with video encoding shows that, more than anything else, it's RAM dependant.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
 
I see your point re: bottlenecks, and a lot of time will also be spent waiting on user input, so the better performer for that bit is the CPU that's more responsive, or 'nimble'. Assuming the user uses the same HDD in either situation, then we're back to the same result - not much difference, but the nod going to the more nimble cpu.
If you mean previewing effects, and mostly after the editing, renderin the movie, AFAIK, that is just like encoding/recoding.
Here's where I agree and disagree. I see that as 2 parts. First is the previewing, in which you're playing the video. Sure, it's possibly (probably?) not in divx or mpeg2 or whichever format yet, its still in some 'raw' format, and it may be split up all over your HDD instead of in 1 file for a nice serial stream (which is precisely what P4 likes), but that's not very cpu intensive either - how much cpu do you use when you're viewing a dvd? So, in that bit, HD and IO speed are your primary bottlenecks. In that case, given the same hd is used, the more 'nimble' cpu will get the nod, possibly by a miniscule amount.

Second is rendering, which is definitely encoding. Actually, that's precisely what I assumed was encoding, along with any conversion from/to formats. That's something you do when you're done editing, and in some situations is best served by a separate rig (or cluster, as in a rendering farm), or done at night when you're not sitting there twiddling your thumbs. Admittedly that's more big production work, but even at home, you start the render, and go away for an hour or so (exception is the short bits, where AMD/Intel is a moot point - are you going to notice a render took 10 or 11 min? - well, you might, but its not real likely to matter).

I tend to the cynical on programmers. While you can have multiple codepaths for different CPU's, given the near-monopoly majority of Intel CPUs in the overall market (more than 80% by anyone's benchmark - close to 90 overall), why spend more than perfunctory time on 'merely' 10% of your potential market? I agree that benchmark tuning happens as well and its a nice bonus if you're doing exactly what the benchmark does.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
 
Hmm, now that's exactly backwards from what I would have thought. I would have thought that rendering would be the definitively P4 era, along with encoding, and cut & paste performance would be more AMD oriented (though close as you'd said).

Clockspeed can overcome efficiency... A motto I once heard, "If force don't work, use more force" fits that concept well.

-- Off to cogitate on that a bit - gotta get my brain around that thought and see if I can find corroboration/proof.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
 
That's interesting that you say that the P4 makes better use of RAM than A64. Especially since AMD is promoting their memory controller as 'integrated into chip' therefore eliminating the FSB and instead using Hypertransport at 1Ghz or 2Ghz speeds. That sounds like it would be faster than Intel's 800Mhz FSB but maybe it marketing hype and not fact.
Anyone know what the benchmarks say about A64 vs. P4 in just pure memory bandwidth?
 
AMD's on-die memory controller has lower latency, but the P4 makes use of more bandwidth.

You can see that fairly easily if you compare both processors in single and dual channel mode, the P4 gets a huge performance increase from dual-channel, while the A64 only gets a small one.

Low latency is great for programs that cache a lot of tiny files to RAM, such as games. Given that everything else on the A64 is great for games, this added boost makes the P4 look terrible by comparison.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
 
AMD's on-die memory controller has lower latency, but the P4 makes use of more bandwidth.

You can see that fairly easily if you compare both processors in single and dual channel mode, the P4 gets a huge performance increase from dual-channel, while the A64 only gets a small one.

Low latency is great for programs that cache a lot of tiny files to RAM, such as games. Given that everything else on the A64 is great for games, this added boost makes the P4 look terrible by comparison.

So that is to say, the P4 makes use of more memory BANDWIDTH, even though the A64 does a great job at reducing memory controller latency.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
 
I think you were right all along dispite what viditor said:
For transitions and effects (wipes, dissolves, etc...), this involves rendering. While Intel used to have an edge there as well, this has changed. Now it's AMD with the slight edge, and with multi-threaded renders (this will depend on which software you're using) the X2 has a big edge.
That is simply not born out in the benchmark that I referred to in starting this discussion. See MPEG2 tests under:
<A HREF="http://www.sharkyextreme.com/hardware/cpu/article.php/3261_3514901__6" target="_new">http://www.sharkyextreme.com/hardware/cpu/article.php/3261_3514901__6</A>

That benchmark was just done 2 days ago and the P4's have a substantial advantage over the A64s.

viditor also mentioned 'multi-threaded renders' and I have to disagree and say that Intel still retains an advantage here too. Nearly all of the single core P4s feature Hyperthreading which AMD can't do. If you compare the very top of the line the AMD X2 can run two threads because of dual core but the Intel EE 840 can run 4 simultaneous threads because of dual core AND Hyperthreading. In all the encoding benchmarks referred to above the X2 and the EE 840 are nearly identical. The single core processors are a different story with Intel having a definite advantage (around 40% which is substantial).
 
So all this talk about the AMD having the FSB 'integrated into chip' and Hypertransport being 1 or 2GHz which makes it sound like it has tons of memory bandwidth is just marketing hype.
Intel still has more memory bandwith than AMD, correct?
 
No. The Amd chip has just as much memory band width. Even THG shows <A HREF="http://www.tomshardware.com/cpu/20050221/prescott-10.html#synthetic" target="_new">that</A>
Most of the time Intel's bandwidth is gobbled up. When programs are well optimized for Intel's SSE2, and other throughput enhancements, the speed and bandwidth can be better utilized.
This shows up better "on the rails" than in actually use. Video encoding seems to be the exception. Even there, it is only true, in comparison, with well optimized progs.
 
Both have the same memory bandwidth, it's just that the P4 uses more of it in this application, from what I can tell. And "from what I can tell" comes from experience, using various processors at various bus speeds through various memory systems, for video encoding.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
 
>Both have the same memory bandwidth, it's just that the P4
>uses more of it in this application

Nonsense. Memory bandiwth requirements are simply defined by the software and to some extent, how efficient the cpu is at executing the software. If the cpu is dogslow at certain routines, more memory bandwith will simply not be needed. If you'd somehow manage to make that code run 10x or 100x faster, either processor would be bandwith starved.

Its an urban legend P4 thrives on bandwith and A64 on low latency; reality is both thrive on low latency, and only on more bandwitdh for those apps where they are fast enough for the available bandwith to be a constraint.

= The views stated herein are my personal views, and not necessarily the views of my wife. =
 
No. The Amd chip has just as much memory band width. Even THG shows that
The url you referenced <A HREF="http://www.tomshardware.com/cpu/20050221/prescott-10.html#synthetic " target="_new">http://www.tomshardware.com/cpu/20050221/prescott-10.html#synthetic </A>

does show a huge advantage that the Intel has in the PC Mark CPU Bench. Why is this if the A64 is supposed to be a faster CPU? Could this be the reason the video encoding benches are won by Intel rather than memory bandwidth?
 
the reason pcmark shows an advantage is because its optimized for intel's ht. if you compare scores where the x2 is pitted against the pentium d, where both cpus can run mutliplte threads, the scores are a lot closer, dont remember who wins.
 
That would likely take you back to branch prediction, which the P4 does poorly, and that video editing could make less use of it, taking full advantage of the P4's higher clock speed. Which would in turn put data through the RAM at a higher rate.

Still, you see the P4 getting a bigger boost from dual-channel mode than the A64. And still you see the A64 beeting the P4 in most applications even with dual-channel enabled.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>
 
Umm, P4 has better branch prediction than the A64 - it needs it because the penalty for a missed branch is almost twice what the A64 is.

Mike.

<font color=blue>Outside of a dog, a book is man's best friend. Inside the dog its too dark to read.
-- Groucho Marx</font color=blue>
 
I'm looking at the end result, the P4 acting like it has a lot of latency, being caused by whatever reason related to its deeper pipeline.

<font color=blue>Only a place as big as the internet could be home to a hero as big as Crashman!</font color=blue>
<font color=red>Only a place as big as the internet could be home to an ego as large as Crashman's!</font color=red>