AMD CPU speculation... and expert conjecture

Page 61 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.



Not even close.

GPU's have instruction and data caches along, instruction prefetch units, scheduling units and memory controllers. Their full blown processors only they don't process x86 and instead process vector instructions (SIMD). The GPU inside a APU or HD3000/4000 is FAR beyond the IGP's of old. To the layman they all "make the pixels change color" but internally there is a vast difference between them. Also be careful of the term "support X/Y/Z graphics language" that doesn't mean those instructions were actually processed inside the GPU at all. OpenGL is traditionally implemented as a software render engine with the hardware doing as much as it indicates it can do. A 8MB PCI video card from the mid 90's can technically "support OpenGL" thought it would be the CPU that's doing everything. DX on the other hand has a split implementation. Primarily it's implemented in hardware but there is a Windows DX software renderer in place to do basic rendering as long as the graphics adapter can support creating surfaces. You can see this on the old Intel GMA's and Via Media Processors (their name for IGP). DXdiag runs just fine and really old DX games might even play, but anything made after 2000's will give you software rendered performance if it even runs at all (some software will refuse to run if it doesn't detect hardware support for specific features). And that's just rasterization which has been going on since LONG before NVidia coined the term "Graphics Processing Unit" when they added TnL and geometry support to their lineup.

Honestly all these terms are really just ways of describing what is now know as HSA. Its running in an environment where multiple CPU's of different architectures co-exist on the same platform and work together.
 


And you're wrong, as I've said multiple times now. Its the lowest priority thread that always gets booted. Now, if one of the cores is running its idle thread, that naturally gets booted first. After that, its thread that can not run (IO blocks, memory reads, SW locks, etc). Otherwise, its the thread with the lowest priority that gets booted, regardless of how much work it is currently doing.

The only exceptions (I know of) to this are:
1: If the program/thread mask is set to NOT use specific cores
2: If the threads "ideal" processor is free for use [in which case, the thread on that core is far more likely to get booted).

And theres a REALLY easy way to prove this palladin: Just write a really basic C program that makes 3 threads: Three that do a lot of work at a very low priority, one that does a lot of work at a really high priority. See which thread runs for a longer period of time. [Hint: Its going to be the high priority thread].

What part of having multiple CPU cores do you not understand here. Four cores, one is at 100% the other three are somewhere from 0 to 15%. Under no circumstance will the NT scheduler put something else onto that 100% utilizes core when there are three other cores with open resources available. The priority of the thread isn't nearly as important because you have four targets, your not getting four priority one tasks happening simultaneously that all consume large amounts of resources, unless you forgot to update your AV.

Yes it will. Remember how those numbers in Task Manager are derived: The amount of time (over "x" seconds) the idle thread runs, as a percentage, is the inverse of the time a CPU core is doing work. EG: If the idle thread is running, the core is doing no work. That is as much as the OS knows about how much work a CPU is or is not doing. As long as the idle thread is NOT running, the CPU is doing plenty of work, as far as the OS is concerned. A while(1), followed by a sleep(n), will cause a CPU core to go instantly toward 100%, even though most everyone would agree the CPU isn't doing much of anything. Why? Because the idle thread isn't running.

Therefore, because the OS has no clue how much "work" a thread is or is not doing, its impossible to boot a thread based on how much "work" it does. Threads get assigned based on priority, at that instant in time.
 


Well, as far as I know, the HT link maxes out at ~25GB/s using 32bit links, right? And it usually operates in 2 or 8bit bursts, right?

You can saturate it easily if you ask me. DDR3 alone can put that effective bandwidth to a full halt and cause a bottleneck for the rest of the PCIe vicinity. It's quite better than FSB, but I still find it inferior to QPI. Not in terms of bandwidth, but in terms of data arrangement since both achieve the same theoretical bandwidth.

Oh and I do remember some sockets having "blank pins". They were used as debug and/or "just in case" pins, right? XD

Cheers!
 
i called it underwhelming mainly because it's association with bd/pd cpus. right now, socket fm2 offers more for new buyers.
amd has been mum about am3+'s future except that they intend to carry on with it. that's too vague for me. with fm2, we know that kaveri and richland will support it. otoh on intel-side, we know that lga1155 is dead end, haswell will fit lga1150 and broadwell will be soldered on the mobo.
i didn't know that ddr3 could saturate hypertransport bw. afaik, fx cpus can't extract that much bw unless ddr3 2000+ ram is used.
imo, unless you specifically want an am3+ platform, either get an apu rig or intel. apus properly fit entry level segment while intel's pricier cpus fit mid to high end (despite restrictions). you can argue about moar cores or overclockability. moar cores won't be effective unless all softwares scale and windows uses those cores effectively. overclockability hits the point of diminishing returns fast as budget is affected by thermal constraints. by the time mainstream software (e.g. games) starts to use at least 6 cores, amd will very likely launch new platform. as long as softwares mainly use 4 cores max., intel will offer better perf. while apus will offer better value.
 

kettu

Distinguished
May 28, 2009
243
0
18,710
If I recall correctly what AMD said when FM2 was launched was that there will be atleast one more CPU for that socket after Trinity. They didn't specifically state Kaveri would be compatible with it. People at the time might have assumed that because Richland was not on the public roadmaps back then (again, relying on my own memory, as faulty as it is).
 

truegenius

Distinguished
BANNED
Well, as far as I know, the HT link maxes out at ~25GB/s using 32bit links, right? And it usually operates in 2 or 8bit bursts, right?
25GB/s using 32bit links!
but in my pc's bios , i found that it runs at 16bit link (by default) so that means only 10.4 GB/s unidirectional bandwidth on ht3.0
:??: is it still fast enough ?
 



Ahh that explains much. Your under the impression the memory bus runs through HT which is incorrect. The HT bus is from the CPU to the NB, memory bus is separate entirely. You could have DDR3-5000 made from unobtainium and the HT bus speed wouldn't make a difference. The HT bus is only important when your communicating with peripherals through the PCIe and associated bus's. Currently one lane of PCIe data is 500MB's so even a dual GPU x16/x16 setup would be 16GB/s and we're nowhere near to actually using that right now. The socket provides for a 32-bit interface but not all motherboards actually use all 32 in one direction, many use 16/16 with economy boards using 16 down and 8 up.

Picture from wiki to illustrate how it's connected together.

https://en.wikipedia.org/wiki/File:AMD_Bulldozer_chipset.PNG

Current HT bus speeds aren't remotely an issue.

The reason AMD's been on this interface so long is that back when they first implemented it they designed it for future expansion. It's been through two full platform upgrades and still does everything it needs to and then some which is a testimony to how well they planned for expansion. Even something like PCIe 3.0 wouldn't require another socket as it's electrically compatible with PCIe 2.0. The only reason for a major change would be a new memory interface that requires a different socket layout and thus we're expecting the next socket to be when DDR4 is released for general use.
 



What chipset & motherboard do you have? HT on AM3+ provides for 32-bits of interface but very few chipsets actually use all 32 at once. The actual implementation is 16-bits in one direction 16-bits in another with some economy systems doing 16-bits from the CPU to the northbridge and 8-bits from the northbridge to the CPU. The HT bus is used for access to system peripherals not memory and currently is not being saturated at all.
 
Some information for those curious to how HT actually works.

http://www.hypertransport.org/default.cfm?page=FAQs

The spec calls for up to 32-bits but that's only in a single direction, it's up to the designer to designate which lines go in which direciton. Most implementations are 16-bits wide one way and 16-bits wide the other but the specification is flexible to allow custom implementations depending on system needs. Max current speed is then 10.4GB/s in one direction. We're not close to maxing that ~yet~ and by the time we get close they would have upped the speed even more. HT is not a "AMD" thing, it's an industrial standard group similar to PCIe, USB, ect.
 

truegenius

Distinguished
BANNED
What chipset & motherboard do you have?
880g chipset
gigabyte ga-880gm-d2h rev 3.1
http://www.gigabyte.in/products/product-page.aspx?pid=3888#ov

HT on AM3+ provides for 32-bits of interface but very few chipsets actually use all 32 at once.
it only show 8bit and 16 bit (note :- link between cpu and chipset)
ht.jpg


The actual implementation is 16-bits in one direction 16-bits in another
then it is working correctly :D
Untitled-1.jpg


and from memory benchmark
it seems like northbridge is surely a culprit here as 200mhz increase in northbridge is showing visible difference (taken heighest of 5 run from 2.6ghz and 2.4ghz nb (nb and ht are at same speed as different speed causes unstability))
it is also responsible for l3 cache speed
so amd needs higher clocked nb or shift it on cpu die
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810
HT is one of the gems that AMD got out of the Dec Alpha cpu designers they scooped up when DEC was going under. Allowing glue-less point-to-point CPU to CPU communication well ahead of Intel's QPI. Which also happened to be developed by ex-DEC designers.
 

Cazalan

Distinguished
Sep 4, 2011
2,672
0
20,810



I look forward to seeing what these SoC x86 chips can do. It's going to take a few more generations but having CPU/GPU/NB/SB all in 1 chip can have tremendous advantages.
 


From a cost/integration standpoint, sure. But keep in mind, that all takes die space, and makes thermals a challenge. You can't continue to assume the transistor will continue to shrink anymore. I'm also worried about upgradability. In theory, more stuff on the CPU is plenty of justification for more expensive CPU's, making upgrading more expensive.
 

anxiousinfusion

Distinguished
Jul 1, 2011
1,035
0
19,360


Only single heatsink would be needed to cool it too which means the system could take up overall less space.
 
Status
Not open for further replies.