Difference between logical cores and physical cores?

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Deus Gladiorum

Distinguished
I know that AMD uses logical cores, and Intel uses physical cores, but I'm not sure what the technical difference is. I know what the resulting difference in performance is, and that physical cores trump logical cores everywhere (except supposedly in mutli-threaded tasks, though I don't really see that advantage reflected in benchmarks too much).

I often hear that logical cores are more comparable to threads. As much as I'd like to think otherwise, I'm not quite sure what a thread even is or what it does or allows a CPU to do. Is it a physical entity on the CPU? I've also heard that logical cores and threads are comparable because they both "borrow resources" between each other. I'm not sure what that means either. Exactly what does "resources" reference?
 
Solution
No such thing as physical / logical "cores", processing and scheduling doesn't work that way. AMD fx8xxx CPU's have eight independent processing units, aka "cores". They each have their own control unit, integer processor (ALU) and memory storage (L1 cache / registers / Load Store Unit). What they share is the front end instruction decoded that converts the x86 macro-ops into smaller micro-operations that resemble RISC style instructions and the back end L2 cache. Each core has 2 integer units and each module has access to one SIMD co-processor (aka FPU). It's important to note that SIMD FPU's are co-processors and not part of the x86 processor, though we've been making them together for a long time now. They have their own...
AMD Bulldozer and Steamroller chips have two logical cores to each physical core.

No they don't. I explained this above and it has nothing to do with cashiers or any other dumb analogy. Each core inside an AMD module has a separate control unit, memory control unit and processing resources. They are sharing external resources such as instruction decoders and L2 cache. It is factually correct that the FX8350 has eight processing cores, it is factually incorrect to say it has four.

This really isn't a hard concept to grasp. There is no such thing as "physical" or "logical" cores, does not exist. People just don't have nearly enough understanding of how microprocessing works.
 
To the OP,

To really understand these things you need to learn some ASM. Writing something as simple as hello world with a keyboard input will teach you how the CPU operates and how it actually executes code. Too many people work only with high level languages and never learn to understand processing from the CPU's PoV. Once you start learning to work around registers, instruction points, the CPU stack, reading and writing to and from memory address's and working with interrupts you can get a better grasp of what a code stream is and how it gets executed.
 
To be honest, even though there are some terms I'm not entirely familiar with (RISC, SIMD, and FPU) I think I was most comfortable with and understood palladin9479's explanation the most, or at least it's the one I'm most looking for. No offense, guys, but analogies leave too many things out and it's just easier hearing the full explanation, even if it means there's some terms I don't know. Maybe we could help clear some things up? I'm only somewhat familiar with floating points, and the way Wikipedia makes it sound (because Wikipedia is so good at explaining technical terms for the layman, right?) it seems like an FPU has the same job as an ALU from my understanding of it. Don't both provide the same function, apparently, i.e. to do perform basic mathematical operations?
 


CPU's process instructions based on set operand size. A 32-bit CPU would process an operand up to 32-bits in size, a 64-bit would do one up to 64-bits in size. Integer operations account for the vast majority of code because not only are they basic math but also logical compares. Now if a coder needs to work with a non-whole number, something with a decimal place, they have a few choices based on how many decimal places there are. If the number of decimal places is static then they could simply process it as two integers, one being the whole number and the other representing the decimals. But if they are not sure exactly how big the decimal place is and thus how much precision they need, or if they determine they need precision beyond the 32/64 bit of the integer CPU, they instead must do a floating point operation. It's called floating point because the size of the decimal place isn't' required to be static. Traditionally these operations were handled by a floating point co-processor, the 8087, 80287 and 80387. Starting with the 80486 CPU's the FP coprocessor was integrated into the same die as the CPU and became known as the FPU. The x87 FPU can handle decimals up to 80 bits precision but is rather limited in it's scope, it's really only for doing scientific calculations. Later Intel / AMD developed instruction extensions for doing floating point work on Audio / Video calculations, these were known as MMX / 3DNOW. Soon after they morphed the FPU into what is known as a Single Instruction Multiple Data (SIMD) vector coprocessor. Vector instructions are different from integer instructions in that they can execute the same operation on multiple values. If you wanted to add 10 to two or more values at once, vector would be much faster then scalar (Integer). The vector instruction sets are SSE/AVX/FMA which while being a very small amount of code can be 5~10x+ faster then integer for doing some operations.

RISC is not so much a standard as it is a set of principles for CPU design. The x86 design contains many instructions that can require multiple CPU cycles to execute and also has complex memory addressing modes. These things eat up die space and make it difficult to implement large robust architectures. RISC was the idea that every instruction should execute in one clock cycle and to not create any instruction that's function could be handled by a small amount of existing instructions. So while a function in x86 could take 10 instructions to do, each taking several clock cycles, the same function in RISC could be 15~20 instructions but each taking one clock cycle. The modern x86 CPU's from Intel / AMD are hybrid RISC/CISC design's. Externally they are CISC in so much as they read in x86 instructions, internally they translate / split those complex instructions into many smaller RISC instructions that are then run on the various processor resources (ALU/AGU/MMU). Since a single core can have multiple of these resources, it's then possible to execute more then one instruction per clock cycle by speculative prediction.

I know it's a bit dense but hopefully this answers your questions. Like I said, learning some basic ASM will answer all of these for you and it's a really educational experience.
 
That's a lot to take in, and though that last portion is a bit dense, I can understand a bit of it. The last portion doesn't truly answer my question, as I still don't really know the inherent differences between logical and physical cores but really, this all just confirms that in order to truly understand the differences I'm going to have to study quite a bit. Analogies oversimplify, and in a topic like this where things are extremely complex there's a great deal that can be left out.

I understand the difference between 32 bit and 64 bit, that a 64 bit CPU can process a number up to 2^64 - 1, while a 32 bit can only process a number up to 2^32 - 1. I think I understand floating points a bit better now. So essentially they're decimals that don't terminate? So the ALU is used to process numbers that are static, while the FPU does the same job as the ALU, just applied to floating points, i.e. numbers that may not terminate?

I suppose I'll just have to continue studying in computer science in order to do so.
 


That's essentially correct about the FPU part, 8087 has 80-bit precision which is much greater then the 16/32-bit provided by the early processors. Modern CPU's can do 64-bit operations which is typically "good enough" for precision but not speed. SIMD instructions can do 32, 64, 128 and 256 bits of precision while also working on multiple values at once and thus are the preferred method to use.

What we've been trying to say is that there is no such thing as a "logical" core, it's a fictitious creation by people who didn't understand how CPU's process data and wanted to explain why their device manager listed eight CPU's for the i7 when their CPU said it was four cores. The explanation is very simple, much simpler the all those analogies. Physical cores are what your CPU has, the CPU receives instructions from the OS by the use of external registers (AX/BX/SI/IP/ect..) and CPU stack. In order to facilitate faster CPU throughput Intel chose to create two separate external register stacks for each of it's HT cores. This has the result of the OS seeing all eight register stacks and thus believing they are all CPU's and scheduling instructions on them.

So if you absolutely have to use the physical / virtual concept, then physical is what you have, virtual is what your OS see's / thinks you have.
 


So, then how many actual cores are on the AMD fx 8350 processor? I currently have a fx 6100 and my os states it has 3 cores/ 6 logical processors. I understand "logical processors" is just a label; but would this still be the case with the 8350 (just like the i7) 4 cores/ 2 stacks per core?
 


I actually got a one day ban for my last answer so I will be careful about how I elaborate on the Mod's "technically accurate but not very simple" answer. The fact is that there are only 3 complete cores on your FX6100, but there are two of certain items within that core, so you feed it two streams of information. You could look at it as 6 cores that share certain parts (like FP units) or 3 cores that have two of certain parts (like integer units).

On the other hand, in the Intel realm there are also two 'assembly lines' (register stacks), but one robotic hand that works on both, switching from one to the other when appropriate (There is only one FP, and one Integer unit per core, despite it having two threads for example. The FP and Integer units actually do math, where as the register stack arranges the data to be processed with the processing and type needed to do it then sends it on it's way to whatever units carry out those actions). This means at any point in which a program has to fetch data from RAM or otherwise wait, your core can continue to process data on the other assembly line coming into it whilst it waits. That same waiting period is why it works well when AMD has only one of certain processing units on its CPU, because it can work on the other thread during the time it would usually spend waiting.

In the intel example there is very definitely one core being fed two threads of instructions.

This is also why games like BF4 run so well on AMD chips. Because the consoles they were optimized for had two integer units, and Intel doesn't have two integer units, allowing an AMD chip like a 8350 to match a 4770k for a fraction of the price and play BF4 at competitive speeds because BF4 was optimized to use the Integer unit a lot. AMD has worse actual FP units than Intel however, so it will fall over in all non-AMD optimized games to that same Intel tech.


Note to Paladin:
Please don't ban me again for using an analogy. Some people don't understand the nature of a register stack, and if you think about it, SIMD, bit precision, and those other additions made to your information were very knowledge dense for what was a somewhat simple concept when only loosely explained.
 

You are talking about clusters? As haswell have 4 ALU per cluster where piledriver have 2 ALU per cluster put have 2 cluster per module.

Also that is a wrong way to describe it, as it is not necessary true.
 
So basically, if you take Palladin's statement that allegedly FPUs normally should not be expected to be part of each CPU core, then all cores on AMD FX cores are "real physical" cores, except for shared decode and L2 cache.

The Intel's CPUs with hyperthreading are the ones that best fit under the concept of having twice as many logical cores as the actual execution cores.

Regardless of all this mumbo jumbo, you should remember two things. The overall theoretical peak performance of the CPU depends on the performance of individual cores as well as the number cores. Second, you need to do parallel programming to exploit multiple cores, which is inherently harder for programmers. A typical application on a desktop system depends a lot on the performance per core. Before 2005-2007, the software developers could always expect that all CPUs will get faster and faster, maybe twice as fast every two years or so, without changing a line of code. But after that, the hardware vendors basically said: most further improvements in performance will come from additional cores and then washed their hands away. The software developers have been gasping this concept and still struggling to deliver code that executes in parallel on many cores at once since then.

It's like following the instructions for cleaning a dish. Before you put the dish into the drawer you have to dry it, but before you dry it, you have to wash it, and before you wash it you have to put it into the washing machine. This is your thread. You can't do all these tasks at once, so you can't exploit many cores at once, but many cores certainly help if you have many dishes to wash to once because cleaning each dish is one thread. However, you don't always have many dishes. All you have is one large skillet pan.

 
Well, the whole part of what a core is, is pretty much blurry. I have talked to tons of people who knows alot about architecture, but most agreed to:
A core is a processor. Your CPU is a processor which multiple processors..

Just because it have its own scheduler doesn't mean it is a core, that would instead sound like a cluster. There are other alternatives to AMD clombsy scheduling system, an example is Intel which use a unified scheduler for their entire core. GPUs use a scheduler for many cores(To keep all cores working).
 
What it is and How it works aren't the same. The question is only what it is.

Intel has a great and simple demo that visually explains "what" it is
http://www.intel.com/content/www/us/en/architecture-and-technology/hyper-threading/hyper-threading-technology-video.html

You don't really need to know how it works, but I can keep it short. Basically, each thread feeds data for processing to your chip bit-by-bit. It feeds info only when needed, so sometimes, there will be a pause in the feed to your processor. When that happens, the processor will processes an empty chunk where data could have been placed. Hyper Threading fills those empty spot with data from another thread, so the core will at least do something while waiting on the other thread. It's a bit oversimplified, but that's the basic concept. 🙂
 

You were banned for offensive behavior, and you know it. Be glad it was only one day, others give a log longer for that kind of behavior. Keep it up and it'll be much longer.
 
Status
Not open for further replies.