Xeon Architecture or block diagram

sharanbr

Reputable
Sep 27, 2014
12
0
4,510
Hello All,

Can anyone give me pointers as to where I can find some good block diagram or Architecture for Xeon processor. Mainly I am a little confused with the number of root complex ports in the system (not just processor but full system), how these root complex ports are used, interface from processor to chipset, chipset to south bridge, concept of PCH etc.

If someone can help me with some pointers, I would be very thankful
 


Hi,

Intel's consumer microprocessors (i3, i5, and i7) and business microprocessors (Xeon E3, E5, E7) are usually derived from the same designs. Most of the technical information that you're looking for can be found in Intel's published datasheets which can be found on ark.intel.com
 


Thanks. I got one pdf that talks about intel architecture basics.

There are few observations & questions. I will be glad to get comments ...

The processor has one PCIe interface apart from DMI, Display and Memory interfaces.
Why is PCIe interface from processor needed for?

The processor also has display port. Is this same as graphics port?

The chipset interface is through DMI. Chipset handles most of the peripheral interfaces.
Is this chipset same as PCH?

Since DMI is used to connect to chipset, what is needed in case of multi-CPU systems for CPU-CPU communication?

 


Intel's current lineup of microprocessors have the PCIe root complexes integrated into the microprocessor. This allows the root complex to communicate with the main memory at a much faster rate than it would if it had to cross another bus from the North Bridge to the CPU. Intel's PCH (formerly called the South Bridge) exposes an addition 8 PCIe 2.0 lanes which do have to cross the DMI2.0 bus (which is itself just a proprietary PCIe 2.0 4x format) in order to reach the memory controller. This adds an additional off-chip transaction layer.

The PCIe lanes originating from the CPU are most commonly used for graphics cards, but they can be used for anything including coprocessors (Xeon Phi, Tesla, FireStream), RAID controllers, PCIe attached SSDs, etc... GPUs tend to benefit the most because they do tend to be time sensitive and bandwidth sensitive.

Intel CPUs that have onboard IGPs use an interconnected called Flexible Display Interface (FDI) to communicate with the PCH. The FDI communication works in parallel with the DMI communication. When present, the PCH exposes the FDI interface as a pair of either DisplayPort, HDMI, DVI, or VGA connectors. FDI is derived from DisplayPort.

In the past the chipset was comprised of two chips, a North Bridge (also called a Memory Controller Hub, or MCH) and a South Bridge (also called a Platform Controller Hub, or PCH). The North Bridge handled high bandwidth, low latency IO and was connected to the CPU(s) via the Front Side Bus. The FSB could be time-division multiplexed to allow multiple CPUs to share the bus, this was the case with all Core 2 Quad microprocessors that were implemented as a pair of Core 2 Duo microprocessors glued together in the same package. All microprocessors communicated with the North Bridge, which held the platform's memory controller. The North Bridge communicated with the South Bridge using a DMI bus (which is derived from PCIe).

Over time, Intel integrated the entirety of the North Bridge into the CPU package. This includes the memory controller, high speed PCIe lanes, and IGP where present. They also integrated the North Bridge's DMI interface into the CPU package, allowing the CPU to communicate with the PCH using DMI/DMI2.

The integration of MCH components into the CPU package didn't occur overnight. The i7-900 series microprocessors had the memory controller integrated, but did not have integrated PCIe lanes or an integrated IGP; these remained on the X58 chipset. The FSB was replaced with QPI. Unlike FSB, QPI is a point-to-point interface which allows one device to communicate with another directly without sharing the bus; requests from one device can be routed through another to reach a further device. Rather than two or more CPUs communicating with the same MCH via a time multiplexed FSB, one CPU communicated with the PCH via QPI, and other CPUs reached the MCH by first communicating with the first CPU via chains of separate QPI links. Since each CPU had its own triple-channel memory controller, the bottleneck caused by the FSB was greatly reduced; instead, QPI overhead was incurred only when a CPU needed to act on memory attached to a separate CPU.

Starting with the Xeon E5 series microprocessors (Sandybridge-E), DMI is used to communicate with the PCH (typically attached to the first socket) and QPI is used to facilitate communication between sockets only.
 


Thank you, pinhedd. This is really useful to me. A few questions though ...

QPI is a very high banwidth bus. Would socket to socket communication need so much bandwidth.
I have been wondering about this but somehow not able to appreciate if such high communication would happen between CPU's.

Other question is regarding PCIe RC. Do Xeon chips normally come with one/two RC ports?
Also, what is the use-case for multiple RC ports?

One more question, when Intel says chipset, are they referring to north bridge/south bridge/PCH/MCH or chipset can mean any chip in Intel based system (e.g. power management). The terminology can be confusing to a person new to Intel world ...

 


CPUs do indeed need substantial amounts of inter-socket bandwidth, such is the cost of uniform memory access. This is largely a consequence of each CPU having its own memory controller and installed memory. The platform firmware and operating system can be optimized to map the physical memory in a fashion that avoids excessive cross-socket communication, but only so much can be done. If a logical processor needs to access memory that is installed on another socket the only way to get it is across the QPI bus. You may notice that the QPI payload size is 64 bytes (+16 bytes of overhead), this is conveniently the same as the size of the architecture cache line. There are also other shared memory considerations such as cache snooping, and time sensitive atomic operations. Faster is simply better.

As for the root complex,

The root complex handles the transactional aspects of PCIe device connection. It's responsible for issuing the requests to the microprocessor and memory systems. It's also responsible for negotiating the link width with the attached device. The architecture of the root complex is responsible for the limitations on the arrangements of the PCIe lanes and the maximum number of connected PCIe devices.

Since DMI is a derivative form of PCIe, it uses its own dedicated root complex.

The Sandybridge microprocessors have a single PCIe root complex that is 16 lanes wide. These lanes can be configured in either 16/0 or 8/8 only for a maximum of two devices. Each may be negotiated down individually. For example, 8/0, 8/4, 4/8, or 4/4.

The Sandybridge-E and Ivybridge-E microprocessors have three PCIe root complexes, two are 16 lanes wide and one is 8 lanes wide. Each of these can be subdivided into 4x links for up to 10 devices connected, but one of the ports can only run 8x mode at most even if it is wired to a 16x expansion slot.

Ivybridge and Haswell microprocessors again have only a single PCIe root complex that is 16 lanes wide, but the configuration is changed to allow 16/0/0, 8/8/0, and 8/4/4. Like Sandybridge, these ports can be down-negotiated.

Haswell-E microprocessors are similar to Sandybridge-E and Ivybridge-E with the exception of the 5820 which has only 28 PCIe lanes. Intel's documentation doesn't list the root complex arrangement for this particular device so my suspicion is that all 40 lanes are present and intact with the particular arrangement being limited in microcode to 28 total lanes in use at any one given time.

EDIT: To answer your question about the chipset. The term "chipset" typically refers to the components that are designed to work with a particular microprocessor. Historically the chipset has been comprised of two main chips, the North Bridge and the South Bridge. However, Intel has completely integrated the North Bridge into the CPU die, so the chipset now consists of only the South Bridge which Intel refers to as the PCH. AMD has not completely integrated the North Bridge (but they are well on their way) and still has two chips as a result.

Other components such as add-in storage controllers, environment monitoring tools, and power management circuitry are mainstays of motherboards but are not considered to be a part of the chipset because they are not designed with a particular family of microprocessors in mind.
 


Have you seen this? http://www.intel.com/content/www/us/en/intelligent-systems/cranberry-lake/xeon-5000-ibd.html
 


Dear Admiraldonut,

Is there a similar figure for desktop processor series (not Xeon class processors)?
 
Dear AdmiralDonut,

Thank you very much. This is very comprehensive.

Dear AdmiralDonut, Pinhedd,

I have a few questions mainly to clear out some basic doubts I have regarding uses of various interfaces,

1. PCIe Gen3 ports avilable on processor - can you let me know a couple of examples the type of application this is used for?

Somewhere I remember seeing graphics processor but is it not that now the processors integrate graphics IP within CPU itself?

2. Are both DMI and FDI used simultaneously by the PCH chipset?
This is the way it is shown in one of the figures.
 


1. On desktops, these are most commonly used for discrete graphics cards. On servers they are used for storage controllers, storage devices, coprocessors, etc... Anything that can be attached to the PCIe 3.0 lanes on the CPU can also be attached to the PCIe 2.0 lanes on the PCH. The benefit of attaching them to the CPU is a shorter path to the system memory which is helpful for bandwidth hungry devices such as graphics cards.

2. If the CPU has an IGP, the chipset is connected via both DMI and FDI at the same time. The DMI connection is a proprietary implementation of PCIe, and FDI is a proprietary implementation of DisplayPort.
 
Hello All,

Sorry to open this thread once again. A few more questions ...

Is there a difference between IO controller hub and platform controller hub.
BTW, what exactly is the purpoe of IO Hub?
 


South Bridge, IO Controller Hub, and Platform Controller Hub are three different names for the same thing. PCH is a newer term that has come into use now that the North Bridge has been more or less completely integrated into the CPU package.