NugieDX :
Assuming ALL other parameter is the same (micro-architecture, FSB speed, etc etc),
does a dual-core 1.0 GHz have the same performance with a single-core 2.0 GHz?
Actually, that number is just an example. You can change that to, say, a quad-core
1.8 GHz vs a dual-core 3.6 GHz. You know what I'm saying, right?
If they're not equal, could and would you tell me why?
Thanks.
Engineer here, I can answer this question with some authority.
The go-to academic rule for expected gains from increasing concurrency is
Amdahl's Law. All programs have some components which simply cannot be explicitly parallelized through any method (microarchitectures perform implicit parallelization through superscalar and reordered execution). The fraction of the program that this strictly serial code comprises establishes an upper bound on the expected speedup.
So, from a purely computer-science approach, a 2.0Ghz single-core microprocessor is better than a 1.0Ghz dual-core microprocessor if and only if all else is equal. If there were no strictly-serial components in the code, and overhead is not considered, they would be equal.
From a computer engineering approach, all bets are off. The trouble with Amdahl's Law is that it is an observation made in the early days of computing, before multiprogramming and resource management became standard parts of most micro-architectures. Now, since we're analysing the hardware rather than the software we have to change our perspective. Rather than quantifying performance in terms of time taken to complete a particular task, we quantify performance in terms of aggregate operations completed across all running tasks within a fixed unit of time. The former is rather self explanatory, but the latter is often denoted as Instructions-per-clock, or IPC. IPC is a useless measure for comparing architectures with dislike instruction sets (such as ARM vs x86) but works fine for comparing architectures with compatible instruction sets (such as AMD's FX series vs Intel's Core i7 series).
IPC is where things start to get really interesting. A traditional ISA core executions instructions in sequence until it is powered off. This sequence is altered by flow control instructions that are part of the program itself, or interrupts that are part of the hardware. There is however no requirement that the ISA core execute instructions exclusively from one context (commonly called a thread) at a time. The only requirement is that the logical equivalence of each context be preserved over time and that the contexts be isolated from each other. The technology which enables a single core to pick and choose instructions from two or more contexts at once in order to maximize instruction throughput is called Simultaneous Multi Threading, or SMT. Intel's proprietary two-thread implementation of this is called Hyperthreading.
SMT allows a single core to be more efficient when running two programs at once as it suppresses the effects of long-latency events such as cache misses. This seems to immediately sidestep Amdahl's law. The performance of the system as a whole is improved by enabling two programs to be executed simultaneously on a single core, including the serial-only sections. The net effect is that while each program may execute somewhat slower due to resource sharing, they will both complete faster when run concurrently than when run sequentially.
Now, lets look at what happens when there are multiple ISA cores accessing the same physical memory space.
Multi-core and multi-socket systems are variations of the same thing, Symmetric Multi Processing, or SMP. SMP sounds somewhat similar to SMT and indeed they have quite a bit in common. The difference is that while SMT allows a single ISA core to more efficiently use the resources that it has, SMP duplicates that ISA core multiple times and hence duplicates the execution resources. For the sake of simplicity, lets focus on a single-socket deployment only, where all cores exist within the same physical package and on the same physical die (as this is no different than multiple-sockets).
Most modern architectures from all manufacturers have separate level1 instruction and data caches. These caches are kept small to reduce access time, and are kept separate because they are both accessed on nearly every cycle and in different stages of the instruction pipeline. The L1 cache is exclusive to a single core, but is shared between contexts running on that core (meaning that SMT shares L1 cache). L2 cache is larger and slower than L1, and
may be shared by multiple cores. L3 cache is larger and slower than L2 cache, and is typically shared by all cores within the package (excepting packages that contain multiple CPU dies such as Core 2 Quad).
More cores means more cache; L1 at a minimum, but most likely L2 as well. A higher operating frequency means that the data in the cache gets accessed more frequently, and the more frequently it gets accessed the more frequently it will not find the data that it is looking for; this is called a cache-miss. A cache miss requires that data be loaded from main memory into the cache all the way down to L1 where it can then be used for execution. If the data is not in main memory and has been swapped out to a backing store such as a hard disk drive, the microprocessor will most likely suspect the running process and switch to another until the load is completed. Modern microprocessors have many features to mask the effects of cache misses such as out-of-order execution and prefetching but these only stretch so far. SMT can fill in the gap by shifting resources to another thread on the microprocessor that has not encountered a miss, but again, this only goes so far. If a microprocessor simply cannot continue, it must stall. Stall cycles are cycles where no instructions are executed. Stall cycles are inevitable, especially in superscalar micro-architectures which execute multiple independent instructions at once (in which case the stall is issued on a per-port basis), but minimizing stall cycles is key to maximizing IPC. If the CPU's operating frequency gets too far ahead of the cache controller's ability to write and load data to and from main memory, it will introduce more and more stall cycles. This is why Intel's celeron microprocessors were very popular for setting overclocking records, they had almost no CPU cache to speak of and really just set the world record for the fastest rate at which nothing was done.
SMP on the other hand introduces more cache to work with, albeit not to the direct benefit of any other core. Good memory controllers can keep up with slower SMP systems than they can with faster uniprocessor systems.
So, in summary there are an awful lot of factors to consider when determining whether or not a 1.0Ghz dual-core system is superior to a 2.0Ghz single-core system. If the work is aggregate, the dual-core will win in most cases, but if it's a single very specific task that is highly sequential, the 2.0Ghz single-core might win.