Same QPI, different CPU speed

leopolder · Sep 15, 2014

Hi All,

I am going to buy one of these ASUS:
ASUS F550LDV-XX483H
ASUS F550LDV-XX953H

I have read their specs and they show
Intel Core i5-4210U 1.7 GHz
Intel Core i7-4510U 2.00/3.10 GHz
respectively

but they both have
bus speed: 5 GT/s

I can understand they ultimately reach the same bandwidth with different clock speed. Possible?
If so, they should be equally quick to the end-user. Is QPI the most relevant feature to look at with respect to overall computer speed?

Thanks

kanewolf · Sep 15, 2014

QPI speed is generally only important to multi-processor motherboards where the CPUs communicate on the QPI bus.

Looking at Intel's page for the I5 and the I7 you see the differences.

I5 -- 2 cores 1.7 Ghz base clock speed -- 3MB cache -- Max Turbo 2.7Ghz
I7 -- 2 cores 2.0 Ghz base clock speed -- 4MB cache -- Max Turbo 3.1Ghz

The number of cores, the base and turbo clock speeds and the cache are the things that will be noticeable from a performance standpoint.

leopolder · Sep 15, 2014

kanewolf :

I also believed so but then I read this

an I/O bus to connect the CPU to the external world. This bus is a new bus called QuickPath Interconnect (QPI)

http://www.hardwaresecrets.com/printpage/Everything-You-Need-to-Know-About-The-QuickPath-Interconnect-QPI/610

kanewolf · Sep 15, 2014

leopolder :

kanewolf :

I also believed so but then I read this

an I/O bus to connect the CPU to the external world. This bus is a new bus called QuickPath Interconnect (QPI)

http://www.hardwaresecrets.com/printpage/Everything-You-Need-to-Know-About-The-QuickPath-Interconnect-QPI/610

With PCIe and SATA moved onto the CPU, it really doesn't matter much any more. That article is from 2008 !!!
With new CPUs the QPI has lost its significance.

leopolder · Sep 15, 2014

kanewolf :

leopolder :

With PCIe and SATA moved onto the CPU, it really doesn't matter much any more. That article is from 2008 !!!
With new CPUs the QPI has lost its significance.

A very last remark: don't you think it is possible 5 GT/s is referred to the PCIe other than QPI speed which was just my own initial guess? If so, is CPU speed limited somehow by the PCIe bandwidth?

kanewolf · Sep 15, 2014

leopolder :

kanewolf :

A very last remark: don't you think it is possible 5 GT/s is referred to the PCIe speed other than QPI which was just my own assumption? If so, is CPU speed limited somehow by the PCIe bandwidth?

Going back to the Intel ARK pages, both CPUs are PCI express Gen 2 devices. They have the EXACT same PCIe configuration. PCIe has its own specs which are independent of Intel. Since these are NOTEBOOK computers, the amount of expansion is minimal anyway. You will probably only have RAM and disk drive which could be expanded/modified.

leopolder · Sep 15, 2014

kanewolf :

leopolder :

Going back to the Intel ARK pages, both CPUs are PCI express Gen 2 devices. They have the EXACT same PCIe configuration. PCIe has its own specs which are independent of Intel. Since these are NOTEBOOK computers, the amount of expansion is minimal anyway. You will probably only have RAM and disk drive which could be expanded/modified.

Sorry to bother you. Please, help me understand.
My i7 is quicker and has a larger cache. That's good to memory management. But when it is time to transfer data to peripherals both the i5 and the i7 do it at the same speed. Why should I pick up the i7?
Am I missing something?

kanewolf · Sep 15, 2014

What peripherals? Disk drive? SATA ports on the CPU. USB? USB ports on the CPU. Graphics? Probably built into the CPU -- not using a discrete graphics card. They are going to work the same for ALL of those devices.

The clock speed, number of cores and cache are going to determine the performance. If you don't want the higher clock speed (more instructions / second) or the higher cache (better cache hit rate) then get the i5. I don't know how I can explain it any better....

Pinhedd · Sep 15, 2014

leopolder :

Neither one of those microprocessors use QPI.

QPI is used on Xeon E5 and E7 series microprocessors to communicate between microprocessors in a multi-socket system. QPI is also used on i7-900 series (and equivalent Xeons) to communicate between the CPU and the North Bridge as the PCIe lanes had not been integrated into the CPU package.

DMI was previously used by the North Bridge to communicate with the South Bridge. However, now that Intel has completely integrated the North Bridge into the CPU package, DMI is now used by the CPU to communicate with the South Bridge (also called the Platform Controller Hub, or PCH). The data rate of DMI2.0 is 5 GT/s per direction per lane. Most configurations use a 4x configuration for a total of 20 gigabits per second in both directions, this is plenty.

All CPUs in the LGA-1156, LGA-1155, LGA-1150, LGA-1356 (not to be confused with LGA-1366), and LGA-2011 use the DMI/DMI2.0 bus to facilitate communication between the CPU and the chipset. LGA-1356 and LGA-2011 also have QPI links which can be used with multiprocessor capable Xeon CPUs.

leopolder · Sep 15, 2014

Pinhedd :

One more hint, please (tell me if you prefer me to open a new thread linking to this one):
may I consider those typical 20 gigabits per second that vendors datasheets report (in fact, I have found 5 GT/s) a feature the like RAM GB is normally thought about, that is, if I have a 4 GB RAM, I only use a portion of it each time?
In other words, is 5 GT/s the maximum throughput the system can hold? That would explain why CPU speed is the limiting factor.

Pinhedd · Sep 15, 2014

leopolder :

The maximum data rate is 5GT/s (GT = Giga Transfer) but there are usually four data lanes (some Atom microprocessors use less) which each transfer one bit per transfer for a total of up to 20 gigabits per second in each direction. I do not know if DMI has PCIe style power management which allows it to adaptively lower the link rate to reduce power consumption but I would assume that it does.

DMI is better compared to PCIe than it is to the system memory. Just like how PCIe is used to connect graphics cards, sound cards, and RAID controllers DMI is used to connect the platform chipset. Whereas the performance SDRAM memory is directly linked to the clock rate of the DRAM IO bus, the performance of PCIe peripherals is not linked to the performance of the PCIe bus unless the bus itself becomes a bottleneck.

In any case, DMI hasn't changed much in the past couple of years because there's no real need to do so. Data that is transferred across the DMI bus needs to originate from somewhere, and end up somewhere. One of those endpoints is typically the system main memory, and the other is usually a peripheral such as a storage device or some other IO peripheral. The low-throughput, high-latency nature of most storage devices means that the DMI bus itself rarely ever becomes a performance constraint. Ultra high performance storage devices such as enterprise grade SSDs are typically attached to the CPU via the CPU's PCIe lanes which gives them better access to the system memory. There are plans to replace DMI 2.0 with DMI 3.0 sometime in the next couple of years though.

leopolder · Sep 16, 2014

I am coming to the conclusion that 5 GT/s for both the processors I mentioned in my question may be wrong.

In view of what said in this thread and looking at a possible formula of the GT/s (http://en.wikipedia.org/wiki/Front-side_bus#Transfer_rates), that should be different in the 2 cases.

In fact, the data path width and the number of data transfer should be the same, the clock frequency (cycles per second) being the only difference.

That also agrees with the fact that the given GT/s is an upper limit as everything runs off the base clock, GT/s just being a sample time based on the clock, and the clock doesn't always run at max speed to save power when it's not needed.

Pinhedd · Sep 16, 2014

leopolder :

FSB and QPI are two different interfaces.

The term "transfer" is used by us engineers to clearly differentiate the rate of data transfer across a bus from the periodic clock signal used to synchronize the various components attached to the bus and the amount of data transferred across the bus on each transfer. Those are three separate ideas that deserve three separate definitions. Unfortunately, marketing departments like to make our jobs a living hell.

In the case of the FSB, the bus transfers data four times per clock cycle (with each transfer 90 degrees out of phase). This is called Quad-Data-Rate, or Quad-Pumped. So, an FSB that has a 200Mhz reference clock synchronizing the transfers between the Northbridge and attached CPUs (there can be more than one, common example being Core 2 Quad which is really just two Core 2 Duo CPUs glued together) transfers data 800,000,000 times per second for a transfer rate of 800 MT/s. Most FSB implementations are 64-bits wide, for a total transfer size of 8 bytes per transfer.

8 bytes (64 bits) per transfer * 4 transfers per cycle * 200 million cycles per second = 6.4 billion bytes per second in each direction

QPI has a very similar formula

20 bits per transfer * 64 bits of payload per 80 bits of transmission * 2 transfers per cycle * 3.2 billion cycles per second = 12.8 billion bytes of payload data per second in each direction

The takeaway here is that the number of transfers per time interval is independent of the amount of data transferred across the link per transfer. QPI has built-in fault protection. it nominally operates at a width of 20 bits in each direction, but if part of it fails it can fall back to 10 bits and even 5 bits. Were this to happen, the bandwidth in each direction would drop from 12.8 billion bytes per second (in the example above) to 6.4 billion bytes per second, to 3.2 billion bytes per second but the transfer rate would remain at 6.4GT/s because the transfer rate is linked to the reference clock and the reference clock doesn't change in this example.

PCIe is very different as it uses both a fixed frequency reference clock (100Mhz) and an embedded data clock which changes based on the link speed. In this case, the transfer rate changes while the reference clock stays fixed. PCIe is also capable of renegotiating the link width. PCIe devices can operate in 1x, 4x, 8x, or 16x link width. In fact, it's possible to cut down the 16x connector on a GPU and fit it into a 4x slot if desired (please don't do this).

Thanks to the embedded data clock and serial link design, PCIe 2.1 transfers data up to 50 times per reference cycle.

100 million cycles per second * 50 transfers per cycle = 5GT/s when operating at PCIe 2.1 link speed

5GT/s * 1 bit per transfer per lane * 8 bits of payload per 10 bits of transmission (8b/10b encoding) = 4 gigabits of payload per second per lane (500MB) in each direction.

500MB per second per lane * 16 lanes = 8GB per second in each direction

If need be, PCIe 2.1 can reduce its link speed speed to PCIe 1.1 speeds and drop the transfer rate from 50 to 25, still with a 100Mhz reference clock.

I hope that this helped a little bit and didn't confuse you too much.

leopolder · Sep 16, 2014

Pinhedd :

leopolder :

FSB and QPI are two different interfaces.

The term "transfer" is used by us engineers to clearly differentiate the rate of data transfer across a bus from the periodic clock signal used to synchronize the various components attached to the bus and the amount of data transferred across the bus on each transfer. Those are three separate ideas that deserve three separate definitions. Unfortunately, marketing departments like to make our jobs a living hell.

In the case of the FSB, the bus transfers data four times per clock cycle (with each transfer 90 degrees out of phase). This is called Quad-Data-Rate, or Quad-Pumped. So, an FSB that has a 200Mhz reference clock synchronizing the transfers between the Northbridge and attached CPUs (there can be more than one, common example being Core 2 Quad which is really just two Core 2 Duo CPUs glued together) transfers data 800,000,000 times per second for a transfer rate of 800 MT/s. Most FSB implementations are 64-bits wide, for a total transfer size of 8 bytes per transfer.

8 bytes (64 bits) per transfer * 4 transfers per cycle * 200 million cycles per second = 6.4 billion bytes per second in each direction

QPI has a very similar formula

20 bits per transfer * 64 bits of payload per 80 bits of transmission * 2 transfers per cycle * 3.2 billion cycles per second = 12.8 billion bytes of payload data per second in each direction

The takeaway here is that the number of transfers per time interval is independent of the amount of data transferred across the link per transfer. QPI has built-in fault protection. it nominally operates at a width of 20 bits in each direction, but if part of it fails it can fall back to 10 bits and even 5 bits. Were this to happen, the bandwidth in each direction would drop from 12.8 billion bytes per second (in the example above) to 6.4 billion bytes per second, to 3.2 billion bytes per second but the transfer rate would remain at 6.4GT/s because the transfer rate is linked to the reference clock and the reference clock doesn't change in this example.

PCIe is very different as it uses both a fixed frequency reference clock (100Mhz) and an embedded data clock which changes based on the link speed. In this case, the transfer rate changes while the reference clock stays fixed. PCIe is also capable of renegotiating the link width. PCIe devices can operate in 1x, 4x, 8x, or 16x link width. In fact, it's possible to cut down the 16x connector on a GPU and fit it into a 4x slot if desired (please don't do this).

Thanks to the embedded data clock and serial link design, PCIe 2.1 transfers data up to 50 times per reference cycle.

100 million cycles per second * 50 transfers per cycle = 5GT/s when operating at PCIe 2.1 link speed

5GT/s * 1 bit per transfer per lane * 8 bits of payload per 10 bits of transmission (8b/10b encoding) = 4 gigabits of payload per second per lane (500MB) in each direction.

500MB per second per lane * 16 lanes = 8GB per second in each direction

If need be, PCIe 2.1 can reduce its link speed speed to PCIe 1.1 speeds and drop the transfer rate from 50 to 25, still with a 100Mhz reference clock.

I hope that this helped a little bit and didn't confuse you too much.

Thank you very much!
I guess I got it (though I am going to read your last answer a couple of times more to grasp it in full). In the end, marketing guys may be right to state 5 GT/s for both processors if PCIe 2.1 is used.

I guess you mentioned PCIe and not DMI here because DMI

is essentially a PCI Express x4 lane connection wrapped in a customized protocol to keep third parties out of the chipset business.

http://arstechnica.com/civis/viewtopic.php?t=40557

Basically, you confirmed kanewolf's above

Going back to the Intel ARK pages, both CPUs are PCI express Gen 2 devices. They have the EXACT same PCIe configuration. PCIe has its own specs which are independent of Intel

Pinhedd · Sep 16, 2014

leopolder :

Marketing departments have this strange fascination with large, simple numbers. They treat consumers like they are idiots (in their defence, many of them are) and this often comes back to bite the engineers and support staff in the ass.

Case in point, DDR memory. Ever heard of 800Mhz memory? Does it refer to DDR3-1600 which has an 800Mhz IO bus clock? Does it refer to DDR3-800 which has a 400Mhz IO bus clock and 800MT/s transfer rate? Does it refer to DDR2-800 which has the same specifications but is electrically and logically different? How about RIMM 6400 (not used in PCs) which has an 800Mhz IO clock? This is why I try to be extremely thorough in my explanations, helps to keep the signal to noise ratio high.

Anyway, I used PCIe in lieu of DMI because I don't have a DMI datasheet handy to dig into the specifics of it. From what I do know, and I think that I mentioned this above, DMI is a derivative form of PCIe that has been adapted by Intel so that it is more suitable for their platform. In particular I imagine that it may have something to do with bootstrapping the platform, as PCIe may not be suitable for this purpose. I'm curious to see if I can find a datasheet on Intel's website to affirm my suspicions, but that's a task for later. As far as data transfer is concerned, PCIe 2.x and DMI should behave the same in both mechanism and throughput.

As far as CPUs go, here's the breakdown of what the connectivity looks like

Intel i7-900 series:

QPI from the CPU to the X58 North Bridge
36 PCIe 2.1 lanes (16/16/4)
DMI from the X58 North Bridge to the ICH10 South Bridge
6 PCIe 1.1 lanes from the ICH10 South Bridge

Intel i7-800 series and below

16 PCIe 2.1 lanes on the CPU (16/0/ or 8/8)
DMI from the CPU to the 5 series PCH
6-8 PCIe 2.0 lanes from the PCH (limited to PCIe 1.1 speeds)

Intel i7-2000 series Sandybridge

16 PCIe 2.1 lanes on the CPU (16/0/ or 8/8)
DMI 2.0 from the CPU to the 6 series PCH
6-8 PCIe 2.0 lanes from the PCH

Intel i7-3000 series Sandybridge-E
40 PCIe 3.0 lanes (uncertified) on the CPU (lots of configurations)
DMI 2.0 from the CPU to the X79 PCH
8 PCIe 2.0 lanes from the PCH

Intel i7-3000 series Ivybridge
16 PCIe 3.0 lanes on the CPU (16/0/0, 8/8/0, or 8/4/4)
DMI 2.0 from the CPU to the 6 or 7 series PCH
6-8 PCIe 2.0 lanes from the PCH

Intel i7-4000 series Ivybridge-E
40 PCIe 3.0 lanes on the CPU (lots of configurations)
DMI 2.0 from the CPU to the X79 PCH
8 PCIe 2.0 lanes from the PCH

Intel i7-4000 series Haswell
16 PCIe 3.0 lanes on the CPU (16/0/0, 8/8/0, or 8/4/4)
DMI 2.0 from the CPU to the 8 or 9 series PCH
6-8 PCIe 2.0 lanes from the PCH

Intel i7-5000 series Haswell-E
28 or 40 PCIe 3.0 lanes on the CPU
DMI 2.0 from the CPU to the X99 PCH
8 PCIe 2.0 lanes from the PCH

leopolder · Sep 16, 2014

I am almost surely missing the basics.
Anyway, I think I found one missing link
http://www.legitreviews.com/images/reviews/1060/P55-blockdiagram.jpg
from
http://forums.extremeoverclocking.com/showthread.php?t=343320
I know it is an old series but I see the northbridge has already been removed here.

Now, the missing link is about working memory and CPU:

The bus speed (back when they used a bus) determines how many transfers per second that a processor can make between the northbridge and the CPU. The processor internal clock determines how many operations per second a processor can make on that data. The problem with this statement here is that it's assuming that in order to do one operation, you need one transfer's worth of data. This isn't really true.

http://www.tomshardware.co.uk/forum/298340-28-what-speed-exactly

With reference to the linked picture and what Pinhedd said in this thread, I assume that the DMI has to deal with low throughput devices but it can bring a lot of data to the processor each time. At this point, I guess those data are stored while the CPU processes them at a constant rate (assuming no power management). So, the DMI and CPU have different paces (based on different clocks).

Pinhedd · Sep 16, 2014

leopolder :

That quote is a bit... strange... but the last point is right on the money.

The bulk of the work that the CPU does is performed internally using ultra fast CPU registers. When the CPU needs to load data from some place (such as the system memory) into these registers (filling) or store data from the registers (spilling) to some place (again, usually memory) it will almost always be able to complete the operation quickly using the CPU cache.

If a load operation finds the desired data in the cache, no further work is necessary. If it doesn't find it in the cache, it has to load the data from the system memory through the memory controller. On FSB based platforms, the memory controller was located on the North Bridge, so the FSB had to handle traffic destined from the CPU to the main memory, and from the CPU to the high speed and low speed IO ports. Fortunately, the North Bridge contained a DMA controller which allows high speed and low speed IO ports to directly access the main memory without the CPU's involvement. Now, the memory controller and high speed IO ports are integrated with the CPU die itself using internal busses; no more FSB. All that's left externally is the low speed IO ports and even these are being integrated in some compact System on a Chip designs.

The aggregate hit rate (the rate at which the CPU finds the data that it is looking for in one of the CPU caches) is well over 95% for modern Intel microprocessors. Furthermore, modern microprocessors are deeply dynamically scheduled, which allows them to continue chugging along on other tasks while they wait for outstanding memory operations to complete.

On modern platforms, the DMI bus can only be saturated when there are a large number of memory <-> PCH transactions occurring. One would have to try and copy files to and from multiple encrypted high performance SSDs to saturate it.

Same QPI, different CPU speed

Reputable

Titan

Reputable

Titan

Reputable

Titan

Reputable

Titan

Champion

Reputable

Champion

Reputable

Champion

Reputable

Champion

Reputable

Champion

Share this page