Can AMD salvage QFX with an in-house chipset?

Page 20 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
BM, I suppose you may define chipset to include the on-die memory controller... in that case, yes, that's exactly what AMD needs to work on to minimize the penalties of NUMA. However, your previous writings have insinuated that it was Nvidia's fault for not producing a good enough chipset (motherboard), and that AMD/ATI could do better, when in actuality, the ball rests in AMD's court to release CPUs with efficient memory controllers (and cores, mind you), and not with MS or board makers to release better software workarounds.

So then SSE should never have taken off since it's the only place Intel ever really had a chance against AMD prior to Core 2.

I fail to see the connection between SSE taking off and tossing of blame between a chipmaker and the boardmakers (in actuality, we don't see such finger-pointing because the engineers all know the weakness rests with AMD IMCs). Mind englightening me there?

Maybe you were referring to what you didn't quote from my last post - that I predicted NUMA/multiple CPU sockets won't spread on the desktop. SSE was introduced as an extension within the chip logic - much like MMX, 3DNow, and x64, but not like IA-64/Itanium. You could ignore the small proportion of transistors used to support this extension - programs would still run faster than on a previous-generation chip - and you certainly didn't need additional hardware to run these new instructions. All you needed to do was convince software developers to write supporting code.

Dual-socket consumer systems, by contrast, go in the face of computing progress. Enthusiasts may purchase them (they have in the past, along with registered RAM), but it would never filter down to the budget consumer because of high costs of both acquisition and ownership - the chipmakers would sooner fuse the cores onto one socket to make the price/heat dissipation more palatable. Since most computers won't have dual sockets, it follows that most won't be using NUMA. How are we to convince consumer-level software writers to account for NUMA issues, now?
 
BM, I suppose you may define chipset to include the on-die memory controller... in that case, yes, that's exactly what AMD needs to work on to minimize the penalties of NUMA. However, your previous writings have insinuated that it was Nvidia's fault for not producing a good enough chipset (motherboard), and that AMD/ATI could do better, when in actuality, the ball rests in AMD's court to release CPUs with efficient memory controllers (and cores, mind you), and not with MS or board makers to release better software workarounds.

So then SSE should never have taken off since it's the only place Intel ever really had a chance against AMD prior to Core 2.

I fail to see the connection between SSE taking off and tossing of blame between a chipmaker and the boardmakers (in actuality, we don't see such finger-pointing because the engineers all know the weakness rests with AMD IMCs). Mind englightening me there?

Maybe you were referring to what you didn't quote from my last post - that I predicted NUMA/multiple CPU sockets won't spread on the desktop. SSE was introduced as an extension within the chip logic - much like MMX, 3DNow, and x64, but not like IA-64/Itanium. You could ignore the small proportion of transistors used to support this extension - programs would still run faster than on a previous-generation chip - and you certainly didn't need additional hardware to run these new instructions. All you needed to do was convince software developers to write supporting code.

Dual-socket consumer systems, by contrast, go in the face of computing progress. Enthusiasts may purchase them (they have in the past, along with registered RAM), but it would never filter down to the budget consumer because of high costs of both acquisition and ownership - the chipmakers would sooner fuse the cores onto one socket to make the price/heat dissipation more palatable. Since most computers won't have dual sockets, it follows that most won't be using NUMA. How are we to convince consumer-level software writers to account for NUMA issues, now?


it's real simple. jack is saying that engrs shouldn't have to optimize for ccNUMA, but it's OK that they optimize for SSE.

Without SSE Intel will still have few wins over AMD. It could be considered as much of a crutch as optimizing "first touch" routines to maintain coherency in RAM requests.
 
Ninja, you've never seen that? That's the greatest internet meme in the history of mankind! Seriously, Newgrounds.com man. You need to waste a lot more of your time.

Optimize for NUMA NUMA iei, nu ma nu ma iei, nu ma nu ma nu ma iei...
 
Ninja, you've never seen that? That's the greatest internet meme in the history of mankind! Seriously, Newgrounds.com man. You need to waste a lot more of your time.

Optimize for NUMA NUMA iei, nu ma nu ma iei, nu ma nu ma nu ma iei...


You may be wasting too much.
 
it's real simple. jack is saying that engrs shouldn't have to optimize for ccNUMA, but it's OK that they optimize for SSE.

Without SSE Intel will still have few wins over AMD. It could be considered as much of a crutch as optimizing "first touch" routines to maintain coherency in RAM requests.

If you would read my statements, they justify popular software developers supporting SSE but not ccNUMA.

On the desktop, ccNUMA is a weakness, not a feature. That's because existing applications slow down relative to a uniform access model - they now need reprogramming. NUMA also represents a small installed base - people aren't flocking to multi-socket multi-channel desktops, and they aren't even flocking to AMD since Intel holds competitive core count and performance without NUMA.

On the other hand, SSE is a feature. Existing applications don't slow down. And over time the installed base becomes tremendous. That's because it's very easy to slip in a small patch of transistors to add functionality to your forthcoming processor lines... look how many people now have x64 or EM64T enabled CPUs but never touch 64-bit. To get software developers to actually support SSE, though, requires that it be useful. And you can find the basis for that in the popularity of media applications.
 
it's real simple. jack is saying that engrs shouldn't have to optimize for ccNUMA, but it's OK that they optimize for SSE.

Without SSE Intel will still have few wins over AMD. It could be considered as much of a crutch as optimizing "first touch" routines to maintain coherency in RAM requests.

If you would read my statements, they justify popular software developers supporting SSE but not ccNUMA.

On the desktop, ccNUMA is a weakness, not a feature. That's because existing applications slow down relative to a uniform access model - they now need reprogramming. NUMA also represents a small installed base - people aren't flocking to multi-socket multi-channel desktops, and they aren't even flocking to AMD since Intel holds competitive core count and performance without NUMA.

On the other hand, SSE is a feature. Existing applications don't slow down. And over time the installed base becomes tremendous. That's because it's very easy to slip in a small patch of transistors to add functionality to your forthcoming processor lines... look how many people now have x64 or EM64T enabled CPUs but never touch 64-bit. To get software developers to actually support SSE, though, requires that it be useful. And you can find the basis for that in the popularity of media applications.


But if the memory is placed properly NUMA can increase perf by allowing apps to run on CPU2 and CPU3 while all OS threads run on CPU0 and CPU1.

Since MS is supporting AMD64 as the de facto standard for Windows now, I think that ccNUMA has large enough installed base to warrant optimization of apps for it.
 
But if the memory is placed properly NUMA can increase perf by allowing apps to run on CPU2 and CPU3 while all OS threads run on CPU0 and CPU1.

My first instinct here is to reply that a good OS exhibits low overhead both in CPU and RAM bandwidth, and thus there is little difference whether OS-specific threads are shunted to a separate core or processing node. That's why a dual-core gains only the slightest advantage over a single-core in single-threaded performance.

But the root question is whether the NUMA advantage - scaling RAM bandwidth, which is very useful for HPC and many server apps - makes much sense for the desktop. And so far I think that, no, common desktop applications depend primarily on memory latency and not raw bandwidth. With better caching technologies, bandwidth limitations are being further mitigated at the same time that bandwidth is growing with each new generation of DRAM.

(Case in point: I can hardly tell any difference when encoding videos with WME or compressing files on a 3.6GHz C2D using single-channel DDR2 RAM instead of dual-channel. The latency in both cases is equal, but the bandwidth in the former is halved.)

Since MS is supporting AMD64 as the de facto standard for Windows now, I think that ccNUMA has large enough installed base to warrant optimization of apps for it.

The problem is that the existing NUMA base is entirely on workstations and servers. The current installed base for NUMA on the desktop market is zero, until 4x4 actually becomes available. In contrast, the current desktop base for x64 includes every Athlon64 and Core 2 Duo as well as most of the Prescott P4s, along with the budget and server variations of these chips.

On top of this, MS is a huge company, and whenever they release Windows, it's expected to take advantage of a ton of hardware out there since their target audience includes pretty much every x86 computer user. The small-time software developer generally writes to a more specific audience, and you have to ask whether it's feasible for the bulk of desktop applications to undergo optimization for inter-node traffic on expensive NUMA test systems.
 
The problem is that the existing NUMA base is entirely on workstations and servers. The current installed base for NUMA on the desktop market is zero, until 4x4 actually becomes available. In contrast, the current desktop base for x64 includes every Athlon64 and Core 2 Duo as well as most of the Prescott P4s, along with the budget and server variations of these chips.

On top of this, MS is a huge company, and whenever they release Windows, it's expected to take advantage of a ton of hardware out there since their target audience includes pretty much every x86 computer user. The small-time software developer generally writes to a more specific audience, and you have to ask whether it's feasible for the bulk of desktop applications to undergo optimization for inter-node traffic on expensive NUMA test systems.

The point is that optimization occurs in the Kernel not specific to class of machine. If NUMA support is there and it isn't needed, it's not going to make anything slower but if you have two sockets it will increase perf IF the algorithm takes into account the size of the available RAM in the second banks.

But at any rate, people who buy QFX will not be buying it as a gaming machine so worse come to worse I can set affinity for SYSTEM processes which should keep the OS away from the second bank. That allows up to 2GB (wish it was 4GB) for a game process.

But then again, 10% off of 135fps is not the worst thing in the world. FX62 already had EXCELLENT numbers and I look for playable frame rates not the highest frame rates.

I can bet that WHEN I get it, I will get more than 70% faster, inclusive of the DX10 GPU. Now this is in relation to MY CURRENT SYSTEM. That's all I care about.
 
I can bet that WHEN I get it, I will get more than 70% faster, inclusive of the DX10 GPU. Now this is in relation to MY CURRENT SYSTEM. That's all I care about.
Baron!!! You sound like a broken fu)king record. 23 pages, and you've said pretty-much the same thing ~ 20 times. Do you type that out in Notepad, and then copy and paste it into your posts.... on every page of your worthless threads?
 
Baron, this is why people say you use your own brand of Logic.
BaronMatrix nLogic



Factors for purchase and debating on forumz:

Does AMD make it?: Factor.

Does it have (noob impression) "more mhz"?: Factor.

Does it have at least one (does not need to be more) benchmark that it does better in?: Factor.

What about all the other benchmarks where it failed to do better? Not a factor.

Does it need to be cost competitive? Not a factor.

Do unusually large energy requirements matter? Not a factor.

Does availability matter? Not a factor.

Does it matter if it uses an additional 100+ watt CPU? Not a factor.

Does it matter if the mother board has the extra power requirements of a second northbridge? Not a factor.


Or, to simplify the above:

Does it favor AMD? Factor.

Does it favor Intel? Not a factor.



If Quad FX spanked the Intel system you'd be touting it's performance, not it's "megatasking platformance". If Quad FX was truly better you'd be pointing all of its strengths instead of making them up and disregarding the weaknesses.
 
I can bet that WHEN I get it, I will get more than 70% faster, inclusive of the DX10 GPU. Now this is in relation to MY CURRENT SYSTEM. That's all I care about.

So according to your Logic my mother, who has a Pentium II should buy the Pentium III machine off that kid who is selling it for $400? Well, it will be X% faster than her current machine, which makes it worth it, right?
 
I can bet that WHEN I get it, I will get more than 70% faster, inclusive of the DX10 GPU. Now this is in relation to MY CURRENT SYSTEM. That's all I care about.
Baron!!! You sound like a broken fu)king record. 23 pages, and you've said pretty-much the same thing ~ 20 times. Do you type that out in Notepad, and then copy and paste it into your posts.... on every page of your worthless threads?


Well, tell the Intel fans to stop saying that I have to compare to C2Q and not to my current system.
 
I can bet that WHEN I get it, I will get more than 70% faster, inclusive of the DX10 GPU. Now this is in relation to MY CURRENT SYSTEM. That's all I care about.

So according to your Logic my mother, who has a Pentium II should buy the Pentium III machine off that kid who is selling it for $400? Well, it will be X% faster than her current machine, which makes it worth it, right?

No, because Vista X64 was not designed to support PIII.
 
I can bet that WHEN I get it, I will get more than 70% faster, inclusive of the DX10 GPU. Now this is in relation to MY CURRENT SYSTEM. That's all I care about.
Baron!!! You sound like a broken fu)king record. 23 pages, and you've said pretty-much the same thing ~ 20 times. Do you type that out in Notepad, and then copy and paste it into your posts.... on every page of your worthless threads?


Well, tell the Intel fans to stop saying that I have to compare to C2Q and not to my current system.

Theres no point in comparing your current system to your future system that has no other options or competition. If you were to compare your current system to say, two variants of your future system, aka C2x or 4x4 or whatever, then you can ask people's opinions. Otherwise, whatever you say here or ask here about 4x4 is useless.
 
Baron, this is why people say you use your own brand of Logic.
BaronMatrix nLogic



Factors for purchase and debating on forumz:

Does AMD make it?: Factor.

Does it have (noob impression) "more mhz"?: Factor.

Does it have at least one (does not need to be more) benchmark that it does better in?: Factor.

What about all the other benchmarks where it failed to do better? Not a factor.

Does it need to be cost competitive? Not a factor.

Do unusually large energy requirements matter? Not a factor.

Does availability matter? Not a factor.

Does it matter if it uses an additional 100+ watt CPU? Not a factor.

Does it matter if the mother board has the extra power requirements of a second northbridge? Not a factor.


Or, to simplify the above:

Does it favor AMD? Factor.

Does it favor Intel? Not a factor.



If Quad FX spanked the Intel system you'd be touting it's performance, not it's "megatasking platformance". If Quad FX was truly better you'd be pointing all of its strengths instead of making them up and disregarding the weaknesses.


I only tout the addition of one socket and the upgrade over my current system. You all write everything else in.

If I want a slower system than ONE SYSTEM OVERALL(C2Q) I can buy it. Stop psycho-analyzing my PC purchase. You will get nowhere.

BTW, I have never said the word platformance.

OK, you all win, I'm going to spend the extra $1500 for an Opteron 2218.

NOT!!!
😳
 
BTW, I have never said the word platformance.

Now there I agree with you. That was a shot at a site that you quote a lot, the Inquirer.

But the fact still stands that you are ignoring performance and sticking with Brand Preference as a factor for debating on these forums.

And I don't know if you think I'm an Intel fanboy or what, but look at my sig, I'm obviously a "performance\Price" fan.
 
BTW, I have never said the word platformance.

Now there I agree with you. That was a shot at a site that you quote a lot, the Inquirer.

But the fact still stands that you are ignoring performance and sticking with Brand Preference as a factor for debating on these forums.

And I don't know if you think I'm an Intel fanboy or what, but look at my sig, I'm obviously a "performance\Price" fan.

I'm just a guy who wants two sockets and Vista X64. Again, I'll let you know when I get it and you can ridicule me. But wait, the only thing faster is C2Q.

I just priced C2Q at Dell and AlienWare.