Questions regarding the PCIe Root Complex, PCIe lanes and the PCH

RJSmith92

Reputable
Mar 30, 2014
16
0
4,510
Hello All,

I have some questions regrading the PCIe Root Complex, PCIe lanes and the PCH and I am hoping someone here will be able to help as I can't find a clear answer online.

I understand that the PCIe Root Complex is the Host Bridge in the CPU that connects the CPU and memory to the PCIe architecture. The 'inner' bus of the root complex is bus 0. The Root Complex contains Root Ports to connect to the PCIe devices.

Firstly, the PCH (southbridge) contains Root Ports and devices that are on bus 0, so is the PCH part of the Root Complex?

If so, what about the devices internal to the PCH (USB Cont., SATA Cont. etc.). They are not behind any Root Ports and are on Bus 0 so would they be part of the Root Complex.

My next question is regarding the PCIe lanes originating from the PCH, my Z87 chipset has 8 PCIe lanes from the PCH. Do these lanes then 'use up' some of the lanes from the CPU, for example if I have a 28 PCIe lane CPU, will 8 of those lanes connect to the 8 lanes in the PCH?

Again, what about the devices internal to the PCH, do these use up any PCIe lanes?

I know there not easy questions to answer but if someone has any information it would be greatly appreciated.

Thanks.
 
Solution


Yeah that diagram shows what I had in mind.

The one listed as PCI Bus is ACPI\PNP0A03
The one listed as PCI Express Root Complex is ACPI\PNP0A08


The PCIe root complex is to the PCIe architecture as the PCI host bridge is to the PCI architecture. The root complex (PCIe) and host bridge (PCI) provide a stateful translation layer between the PCIe/PCI logic on one side, and the system specific logic on the other. This allows a PCIe/PCI device to connect to any system that has a compliant root complex / host bridge without regard to the architecture of the rest of the system. The PCI devices and PCIe endpoints need not care about the specifics of the system's memory subsystem, endianness, etc...

Intel uses an architecture that they call Integrated IO (IIO) to merge all of the previous platform components into one physical part of the CPU. Although the components are now integrated, they are still [at least logically] connected via PCI and are exposed as such to the PC BIOS / UEFI firmware.

Bus 0 joins the DMI root complex, PCIe root complexes (between one and three depending on the chip), DMA engine, and IIO Core to the processor cores. Collectively this forms the Processor IIO Devices. Similarly, Bus 1 joins the QPI links, interrupt handling, core broadcast, power control, IMC, and performance monitoring to the processor cores. Collectively this forms the Processor Uncore devices.

Since the DMI root port (not complex) and PCIe root port(s) bridge a PCIe layer to a PCI layer they are logically just PCI-to-PCI brides. The physical DMI2 connection which joins the PCH to the CPU is invisible to configuration software. By virtue of this, the PCH and the devices connected to it are just a logical extension of the CPU's internal PCI bus 0.

Note that CPUBUSNO 0 and CPUBUSNO 1 are just symbolic names and are processor relative. In multi-socket systems the actual bus number assigned to it may vary.

Intel has been somewhat vague about how all of the PCIe root ports and PCI devices communicate with the system memory via a root complex or host bridge.

As for the lanes, the 8x lanes originating from the PCH are in the form of an 8x port which can be subdivided all the way down to 8 1x ports which enables a large number of low-bandwidth peripherals to be connected at once. This is unlike the CPU which is a 16x port that can only be subdivided three ways (16/0/0, 8/8/0, 8/4/4 with each able to be downnegotiated independently). Logically, these devices all share the same internal bus.

The devices internal to the PCH do not use these lanes as PCIe is primarily used as an external point-to-point bus and has no advantage over PCI when used internally (in fact I would suspect that using PCIe internally would be quite a bit more difficult). However, motherboard manufacturers often utilize some of these lanes to connect additional peripherals such as NICs, Bluetooth, Audio codecs, and additional storage controllers. It's common for four of the lanes to be exposed in the form of a single 4x PCIe slot and three 1x PCIe slots with the other four lanes used for onboard peripherals.
 


Hi Pinhedd, thanks for the reply (I remember speaking to you about this earlier). I would have replied sooner but the forums were down...

Right so the PCIe Root Complex can be seen as the PCI Host Bridge between system logic and the PCIe hierarchy. The 'PCI Express system architecture' book says that bus 0 is the internal bus of the PCIe Root Complex, as PCI bus 0 crosses the DMI and into the PCH would the PCH be classed as part of the Root Complex, or just an extension if it?

I'm not aware of CPUBUSNO 0 and CPUBUSNO 1, I assume these don't have any relation to PCI bus numbers, as the PCI hierarchy will be created 'behind' the PCIe Root Complex/s?

I had assumed that the devices internal to the PCH didn't use PCIe lanes as they are not behind Root Ports, unlike the NIC which is. I had read this online in an article but it must be wrong.

As for the PCIe lanes from the the PCH, do the lanes have anything to do with the lanes in the CPU, for example do the 8 lanes in the PCH 'use up' 8 lanes of those in the CPU.

I understand the PCIe is packet based and that the Root Complex will generate a packet for the CPU. If the CPU wants to write to a device in the PCH, such as a USB or SATA Controller, what would the process be (on a very basic level)?
The the PCIe Root Complex would read the memory address and bridge it onto bus 0 as a packet, from there would the DMI Root Complex intercept the packet and send it across the DMI and then then the internal logic of the PCH would send it to the right device, something like that?

Would a PCIe packet be created for an internal device on bus 0 such as the USB Controllers? what about to a device that's on a PCIe lane/s from the PC, where would the packet be created?

As you can see I'm quite out of my depth with this but I would just like to get a clearer understanding.

Thanks again for your reply, it really is appreciated.

Kind Regards,
Robert
 
Hi Robert,

Hi Pinhedd, thanks for the reply (I remember speaking to you about this earlier). I would have replied sooner but the forums were down...

Yeah I thought that this post seemed kinda familiar. Our forums took a dump yesterday as well.

Right so the PCIe Root Complex can be seen as the PCI Host Bridge between system logic and the PCIe hierarchy. The 'PCI Express system architecture' book says that bus 0 is the internal bus of the PCIe Root Complex, as PCI bus 0 crosses the DMI and into the PCH would the PCH be classed as part of the Root Complex, or just an extension if it?

From the perspective of a unified memory space, PCI supports up to 256 logical buses and each bus supports up to 32 logical devices. In most implementations, devices will be located on either bus 0 or bus 1. This is a throwback to the physical layer of conventional PCI in which the bus was physically constructed as a multi-drop bus in which signals were broadcast between all devices on the bus (and the master as well). Multiple busses were pretty common to solve bandwidth and electrical loading issues.

PCIe largely retains the PCI logical architecture, most of the changes are physical (such as switching from parallel single ended signals to serial differential signals) which make PCIe much friendlier to motherboard and device manufactures.

I'm not aware of CPUBUSNO 0 and CPUBUSNO 1, I assume these don't have any relation to PCI bus numbers, as the PCI hierarchy will be created 'behind' the PCIe Root Complex/s?

CPUBUSNO 0 and CPUBUSNO 1 refer to the two internal logical PCI buses on each Intel CPU. They will be assigned PCI bus numbers so that the devices on each bus can be uniquely identified. In multi-socket systems each bus remains unique, so CPUBUSNO 0 on the CPU in socket 0 will have a different PCI bus ID than CPUBUSNO 0 on the CPU in socket 1. PCI allows for up to 256 busses, so there's lots of room for expansion. A system with four populated sockets would have no less than eight PCI busses, more if the CPUs are constructed from multiple dies glued together. If a PCH is attached to a CPU via DMI, the devices attached to the PCH will be connected to the CPUBUSNO 0 of the CPU in that socket.

I had assumed that the devices internal to the PCH didn't use PCIe lanes as they are not behind Root Ports, unlike the NIC which is. I had read this online in an article but it must be wrong.

Correct, they don't use PCIe lanes. PCIe was designed to overcome limitations in physical expansion, not solve on-chip problems. Since these devices are integrated, they need not follow any sort of PCI physical layer specification, just the logical layer so that they all play nicely together.

As for the PCIe lanes from the the PCH, do the lanes have anything to do with the lanes in the CPU, for example do the 8 lanes in the PCH 'use up' 8 lanes of those in the CPU.

Nope. They are completely independent.

I understand the PCIe is packet based and that the Root Complex will generate a packet for the CPU. If the CPU wants to write to a device in the PCH, such as a USB or SATA Controller, what would the process be (on a very basic level)?
The the PCIe Root Complex would read the memory address and bridge it onto bus 0 as a packet, from there would the DMI Root Complex intercept the packet and send it across the DMI and then then the internal logic of the PCH would send it to the right device, something like that?

One of the reasons that CPUBUSNO 0 and CPUBUSNO 1 are modelled after PCI is because the x86 microarchitecture includes strong built-in support for PCI. This is done through the x86 IN and OUT instructions (and variations). The IN and OUT instructions include two operands, a 32-bit index and a data source/destination register. The index follows a particular bit pattern as follows:

31 = 1 (always 1 for PCI access)
30:24 = 0 (always 0 for PCI access)
23:16 = bus number (0-255)
15:11 = device number (0-31)
10:8 = function number (0-7)
7:0 = register number (0-255)

This allows any x86 core in a system to access any PCI device on any bus in the system. The second operand will then read from or write to the identified register. PCIe is a superset of this, and includes more registers but these are only accessible via MMIO which is quite difficult to setup, PMIO remains the same. The PCI subsystem will then route the data to the appropriate device and send the appropriate signals (read/write) along with it.

Would a PCIe packet be created for an internal device on bus 0 such as the USB Controllers? what about to a device that's on a PCIe lane/s from the PC, where would the packet be created?

The PCIe packets are handled between the port and the endpoint. Since the ports on Intel's architecture are logical PCI to PCI bridges, it's likely that the translation occurs there at the link layer on the PCIe side of the bridge. Since the PCIe port is not connected directly to the memory subsystem, it's unlikely that the memory subsystem itself speaks PCIe in any way, so PCIe packets would only be generated at the periphery. I don't see why a PCIe packet would be created at all for an internal device that's logically connected via PCI. I'm not 100% certain of this though.

As you can see I'm quite out of my depth with this but I would just like to get a clearer understanding.

Thanks again for your reply, it really is appreciated.

Kind Regards,
Robert

Glad to be of assistance :)
 
From the perspective of a unified memory space, PCI supports up to 256 logical buses and each bus supports up to 32 logical devices. In most implementations, devices will be located on either bus 0 or bus 1. This is a throwback to the physical layer of conventional PCI in which the bus was physically constructed as a multi-drop bus in which signals were broadcast between all devices on the bus (and the master as well). Multiple busses were pretty common to solve bandwidth and electrical loading issues.

PCIe largely retains the PCI logical architecture, most of the changes are physical (such as switching from parallel single ended signals to serial differential signals) which make PCIe much friendlier to motherboard and device manufactures.

Thanks, I sort of understand this, what I was trying to do is understand the boundary of the PCIe Root Complex. As the PCH has Root Ports on PCI bus 0, together both the CPU and PCH create 1 logical Root Complex, even though the bridging mechanism is located in the CPU.

CPUBUSNO 0 and CPUBUSNO 1 refer to the two internal logical PCI buses on each Intel CPU. They will be assigned PCI bus numbers so that the devices on each bus can be uniquely identified. In multi-socket systems each bus remains unique, so CPUBUSNO 0 on the CPU in socket 0 will have a different PCI bus ID than CPUBUSNO 0 on the CPU in socket 1. PCI allows for up to 256 busses, so there's lots of room for expansion. A system with four populated sockets would have no less than eight PCI busses, more if the CPUs are constructed from multiple dies glued together. If a PCH is attached to a CPU via DMI, the devices attached to the PCH will be connected to the CPUBUSNO 0 of the CPU in that socket.

Do all Intel CPUs implement this? A look at my system shows PCI bus 1 to be located behind the Root Port in the CPU that links to the GPU.

Correct, they don't use PCIe lanes. PCIe was designed to overcome limitations in physical expansion, not solve on-chip problems. Since these devices are integrated, they need not follow any sort of PCI physical layer specification, just the logical layer so that they all play nicely together.

Nope. They are completely independent.

Thanks, understood :)

One of the reasons that CPUBUSNO 0 and CPUBUSNO 1 are modelled after PCI is because the x86 microarchitecture includes strong built-in support for PCI. This is done through the x86 IN and OUT instructions (and variations). The IN and OUT instructions include two operands, a 32-bit index and a data source/destination register. The index follows a particular bit pattern as follows:

31 = 1 (always 1 for PCI access)
30:24 = 0 (always 0 for PCI access)
23:16 = bus number (0-255)
15:11 = device number (0-31)
10:8 = function number (0-7)
7:0 = register number (0-255)

This allows any x86 core in a system to access any PCI device on any bus in the system. The second operand will then read from or write to the identified register. PCIe is a superset of this, and includes more registers but these are only accessible via MMIO which is quite difficult to setup, PMIO remains the same. The PCI subsystem will then route the data to the appropriate device and send the appropriate signals (read/write) along with it.

Thanks. What I was trying to ask was where is the PCIe packet created (which you answered later :) ), regardless of whether MMIO or I/O is used if you see what I mean? If a USB Controller accepts the MMIO range 0x10000 - 0x20000 and the CPU makes a write to 0x15000, what is the process.

I sort of understand it, the Root Complex will be programmed to accept the range 0x10000 - 0x20000 and then pass it onto PCI bus 0 where it crosses the DMI and into the PCH where the logic in the PCH will sent it to the right device. Which sort of brings me onto the next point...

The PCIe packets are handled between the port and the endpoint. Since the ports on Intel's architecture are logical PCI to PCI bridges, it's likely that the translation occurs there at the link layer on the PCIe side of the bridge. Since the PCIe port is not connected directly to the memory subsystem, it's unlikely that the memory subsystem itself speaks PCIe in any way, so PCIe packets would only be generated at the periphery. I don't see why a PCIe packet would be created at all for an internal device that's logically connected via PCI. I'm not 100% certain of this though.

I would agree that the packets are created at the Root Ports, if the it was the Root Complex in the CPU that created the packets, how would it know memory ranges in the PCH are for internal devices on bus 0 and therefore don't need to be sent as a PCIe packet, where as a requests to the NIC ranges behind a Root Port would need to be sent as a packet. This agrees with my first point that the PCH is part of one logical Root Complex as PCIe packets are being generated there connecting PCIe devices to the system.

Glad to be of assistance :)

Thanks as always, promise not ask any more questions after this 😉

Kind Regards,
Robert

 
Thanks, I sort of understand this, what I was trying to do is understand the boundary of the PCIe Root Complex. As the PCH has Root Ports on PCI bus 0, together both the CPU and PCH create 1 logical Root Complex, even though the bridging mechanism is located in the CPU.

That's my understanding of it as well. The system bus connections are located deep inside of the CPU. Bridges are used to logically expand the buses to include the PCH and its attached devices.

Do all Intel CPUs implement this? A look at my system shows PCI bus 1 to be located behind the Root Port in the CPU that links to the GPU.

It seems to vary a little bit, especially since the PCI communication is logical rather than physical.

On my system, the actual PCIe root ports are all on bus 0 (CPUBUSNO 0), but the QPI links and IMC are on bus 255 (CPUBUSNO 1). PCH devices such as the USB controllers, SATA RAID controller, NIC, etc... are also all on bus 0. Each external PCIe switch fabric seems to have its own virtual bus, even though they're just a logical extension of CPUBUSNO 0. My GPUs are on buses 1 and 3, with my sound card on bus 2. This is consistent with the PCIe root ports being PCI to PCI bridges, they bridge one bus to another as the far bus must have a non-zero bus ID.

Thanks. What I was trying to ask was where is the PCIe packet created (which you answered later :) ), regardless of whether MMIO or I/O is used if you see what I mean? If a USB Controller accepts the MMIO range 0x10000 - 0x20000 and the CPU makes a write to 0x15000, what is the process.

I sort of understand it, the Root Complex will be programmed to accept the range 0x10000 - 0x20000 and then pass it onto PCI bus 0 where it crosses the DMI and into the PCH where the logic in the PCH will sent it to the right device. Which sort of brings me onto the next point...

Each PCI bus gets its own configuration space, this configuration space may be extended if PCIe devices are used. If MMIO isn't used, every register belonging to every device on the bus must be accessed by PMIO and only 256 registers per function per device per bus are accessible using the method outlined in my previous post. MMIO may be enabled for a legacy bus (256 registers) which consumes up to a 64MiB chunk of physical address space (256 * 8 * 32 * 256). If extended configuration is used, the total amount of memory allocated to the PCI configuration space is up to 256MiB (4096 * 8 * 32 * 256) but registers above 255 can only be accessed via MMIO.

In addition to the MMIO configuration space, each device can carve out generic MMIO ranges. This is the cause of the 3.5GiB memory limit on 32-bit client versions of Windows. The PCI configuration spaces are mapped to the top of the lower 4GiB of the physical address space for compatibility with non-PAE kernels. Client versions of Windows use PAE but ignore all memory mapped above 4GiB for marketing reasons. Some PCI devices, GPUs in particular, love to carve out huge MMIO ranges.

Note that the firmware must enumerate the buses using PMIO and detect all of the devices before MMIO can be configured. There is a procedure for polling each device to determine its requested MMIO size. This is done by writing all ones to each device's Base Address Register and then reading back the desired size. The firmware can then write the configured range back into the BAR(s). The configuration space has room for 6 BARs, which allows up to 6 ranges. These ranges are programmed into the firmware's ACPI tables and into the memory controller. The memory controller detects access to these ranges and routes the request to the appropriate bus. Each range can correspond to more than one bus, which allows a bridge on that bus to forward the request. Due to design limitations, the range is always a power of 2 and is 16 byte aligned. The total memory mapped size per controller is also a power of 2 I believe.

Summary:

PCI Port addressed configuration space (256 registers PCI and PCIe)

PCI Memory addressed configuration space (256 registers PCI and 2096 registers PCIe)

PCI shared memory (up to 6 per function, power of 2, 16-byte aligned)

I would agree that the packets are created at the Root Ports, if the it was the Root Complex in the CPU that created the packets, how would it know memory ranges in the PCH are for internal devices on bus 0 and therefore don't need to be sent as a PCIe packet, where as a requests to the NIC ranges behind a Root Port would need to be sent as a packet. This agrees with my first point that the PCH is part of one logical Root Complex as PCIe packets are being generated there connecting PCIe devices to the system.

I think this is pretty well explained above. Each root port is a bridge and each bridge provides access to a new bus. A host controller's memory range can include multiple buses. The particular register can be decoded from the memory address just like it can from the port address. Once decoded, the memory controller hands the operation off to the host controller.

Thanks as always, promise not ask any more questions after this 😉

Keep em coming!
 
That's my understanding of it as well. The system bus connections are located deep inside of the CPU. Bridges are used to logically expand the buses to include the PCH and its attached devices.

Yep :)

It seems to vary a little bit, especially since the PCI communication is logical rather than physical.

On my system, the actual PCIe root ports are all on bus 0 (CPUBUSNO 0), but the QPI links and IMC are on bus 255 (CPUBUSNO 1). PCH devices such as the USB controllers, SATA RAID controller, NIC, etc... are also all on bus 0. Each external PCIe switch fabric seems to have its own virtual bus, even though they're just a logical extension of CPUBUSNO 0. My GPUs are on buses 1 and 3, with my sound card on bus 2. This is consistent with the PCIe root ports being PCI to PCI bridges, they bridge one bus to another as the far bus must have a non-zero bus ID.

Pretty similar to mine, although I'm not sure what you mean by 'Each external PCIe switch fabric seems to have its own virtual bus, even though they're just a logical extension of CPUBUSNO 0. '?

Each PCI bus gets its own configuration space, this configuration space may be extended if PCIe devices are used. If MMIO isn't used, every register belonging to every device on the bus must be accessed by PMIO and only 256 registers per function per device per bus are accessible using the method outlined in my previous post. MMIO may be enabled for a legacy bus (256 registers) which consumes up to a 64MiB chunk of physical address space (256 * 8 * 32 * 256). If extended configuration is used, the total amount of memory allocated to the PCI configuration space is up to 256MiB (4096 * 8 * 32 * 256) but registers above 255 can only be accessed via MMIO.

In addition to the MMIO configuration space, each device can carve out generic MMIO ranges. This is the cause of the 3.5GiB memory limit on 32-bit client versions of Windows. The PCI configuration spaces are mapped to the top of the lower 4GiB of the physical address space for compatibility with non-PAE kernels. Client versions of Windows use PAE but ignore all memory mapped above 4GiB for marketing reasons. Some PCI devices, GPUs in particular, love to carve out huge MMIO ranges.

Note that the firmware must enumerate the buses using PMIO and detect all of the devices before MMIO can be configured. There is a procedure for polling each device to determine its requested MMIO size. This is done by writing all ones to each device's Base Address Register and then reading back the desired size. The firmware can then write the configured range back into the BAR(s). The configuration space has room for 6 BARs, which allows up to 6 ranges. These ranges are programmed into the firmware's ACPI tables and into the memory controller. The memory controller detects access to these ranges and routes the request to the appropriate bus. Each range can correspond to more than one bus, which allows a bridge on that bus to forward the request. Due to design limitations, the range is always a power of 2 and is 16 byte aligned. The total memory mapped size per controller is also a power of 2 I believe.

Summary:

PCI Port addressed configuration space (256 registers PCI and PCIe)

PCI Memory addressed configuration space (256 registers PCI and 2096 registers PCIe)

PCI shared memory (up to 6 per function, power of 2, 16-byte aligned)

A lot of great info thanks. I've actually looked quite a bit into ACPI tables, DSDT and such to get an understanding of how the OS learns of the underlying hardware.

When I'm talking about read/writing to a device I am usually talking about MMIO rather than I/O or config space. As you said 'keep the questions coming!', here's another one.

Once the system is set up and all the devices are configured, would an MMIO access to a device connected to a Root Port in the PCH look something like this (on a very basic level)?

1. CPU issues access to address of device.
2. The PCIe Root Complex has it's registers configured to forward this address onto the logical PCI bus 0.
3. No device accepts address in CPU and then reaches the DMI Root Complex.
4. DMI Root Complex accepts it as it uses subtractive decoding.
5. DMI sends it out of Root Port to PCH.
6. PCH accepts and uses internal logic to send the access to the appropriate Root Port (logical PCI-PCI bridge).
7. Root Port accepts request and sends a PCIe packet out onto the PCIe lanes.
8. PCIe devices checks address with BARs and accepts.

Would it be something like that?

I think this is pretty well explained above. Each root port is a bridge and each bridge provides access to a new bus. A host controller's memory range can include multiple buses. The particular register can be decoded from the memory address just like it can from the port address. Once decoded, the memory controller hands the operation off to the host controller.

I understand a Root Port is just a logical PCI-PCI bridge, what I meant was when the Root Complex in the CPU gets a write request for 0xED000000, it wouldn't know whether the target device is an actual PCIe device and a PCIe packet needs creating, or a device internal to the PCH (USB Cont.) and therefore no PCIe packet is needed. Therefore we agree that it is likely the PCIe Root Ports that do this.

In other words, the only time a PCIe packet is created is when a physical PCIe bus is involved.

Keep em coming!

That's all, for now... :)

Kind Regards,
Robert
 
Hi Robert, sorry for the delay in replying to this thread. I was occupied with other stuff today and it slipped my mind.

Pretty similar to mine, although I'm not sure what you mean by 'Each external PCIe switch fabric seems to have its own virtual bus, even though they're just a logical extension of CPUBUSNO 0. '?

Since the PCIe root ports on Intel's platform are PCI-to-PCI bridges, each port acts as a device on CPUBUSNO 0 (on whatever bus id that is, which should be 0 in a single-socket system) that exposes another bus to which it can forward operations.

For example, Intel's LGA-2011 CPUs have three PCIe root ports on the CPU itself. Two are 16x and one is 8x. Each of these ports can be subdivided. The 16x ports can be subdivided four ways, and the 8x port can be divided two ways. This provides up to ten 4x ports from the CPU, 1a, 1b, 1c, 1d, 2a, 2b, 2c, 2d, 3a, and 3b. The ports themselves appear CPUBUSNO 0, and expose the devices on the buses to which they provide a bridge.

Devices attached to port 1a may be on bus 1, devices attached to port 1b may be on bus 2, and so on. Since PCIe uses a switch fabric rather than a multi-drop bus all of the endpoints attached to the switch fabric share the same bus number. Each of my GPUs have a GPU device and audio device. These appear as two separate devices connected to the same bus.

What's interesting is the behaviour of the DMI interface. Unlike the PCIe root ports, it doesn't seem to bridge one bus to another bus. The devices that are attached to the PCH are also on CPUBUSNO 0 and share its bus number. In other words, DMI seems to be almost completely transparent.

A lot of great info thanks. I've actually looked quite a bit into ACPI tables, DSDT and such to get an understanding of how the OS learns of the underlying hardware.

When I'm talking about read/writing to a device I am usually talking about MMIO rather than I/O or config space. As you said 'keep the questions coming!', here's another one.

Once the system is set up and all the devices are configured, would an MMIO access to a device connected to a Root Port in the PCH look something like this (on a very basic level)?

1. CPU issues access to address of device.
2. The PCIe Root Complex has it's registers configured to forward this address onto the logical PCI bus 0.
3. No device accepts address in CPU and then reaches the DMI Root Complex.
4. DMI Root Complex accepts it as it uses subtractive decoding.
5. DMI sends it out of Root Port to PCH.
6. PCH accepts and uses internal logic to send the access to the appropriate Root Port (logical PCI-PCI bridge).
7. Root Port accepts request and sends a PCIe packet out onto the PCIe lanes.
8. PCIe devices checks address with BARs and accepts.

Would it be something like that?

It's hard to say exactly, but I imagine that an MMIO access would look something like this

1. CPU issues memory access instruction. The address is put through virtual to physical translation (it should be marked as uncachable in the page table, the MTRR, or the PAT, so the cache should not be accessed). The instruction is marked as outstanding for the purpose of data dependency and is sent up the chain to the memory controller.

2. The MMU decodes the address range (ACPI tables?) and selects the appropriate system bus to send the MMIO operation to. Memory operations and inter-processor operations get sent to CPUBUSNO 1, while local IO operations get sent to CPUBUSNO 0

3. Devices on the bus examine the address signal to see if it falls within one of their programmed range(s) and respond accordingly. PCI-to-PCI bridges store the upper and lower range of the downstream devices in their BARs. For example, a bridge joining bus 1 (downstream) to bus 0 (upstream) has devices on bus 1 that require an MMIO range of 256MiB. The bridge device on bus 0 would have a contiguous MMIO range of 256MiB to expose the 256MiB carved out by the devices on bus 1. This memory assignment is recursive, so all devices downstream of a point appear as one big contiguous blob.

4. The PCIe root port then packetizes the operation and sends it downstream to the PCIe devices. Each PCIe device then checks it against its own BAR and accepts.

In other words, the only time a PCIe packet is created is when a physical PCIe bus is involved.

I believe that's correct. PCIe packets are link-layer and are an element of the PCIe fabric, not the PCI logic.
 
Hi Robert, sorry for the delay in replying to this thread. I was occupied with other stuff today and it slipped my mind.

No worries, I'm just glad you reply at all :)

Since the PCIe root ports on Intel's platform are PCI-to-PCI bridges, each port acts as a device on CPUBUSNO 0 (on whatever bus id that is, which should be 0 in a single-socket system) that exposes another bus to which it can forward operations.

For example, Intel's LGA-2011 CPUs have three PCIe root ports on the CPU itself. Two are 16x and one is 8x. Each of these ports can be subdivided. The 16x ports can be subdivided four ways, and the 8x port can be divided two ways. This provides up to ten 4x ports from the CPU, 1a, 1b, 1c, 1d, 2a, 2b, 2c, 2d, 3a, and 3b. The ports themselves appear CPUBUSNO 0, and expose the devices on the buses to which they provide a bridge.

Devices attached to port 1a may be on bus 1, devices attached to port 1b may be on bus 2, and so on. Since PCIe uses a switch fabric rather than a multi-drop bus all of the endpoints attached to the switch fabric share the same bus number. Each of my GPUs have a GPU device and audio device. These appear as two separate devices connected to the same bus.

What's interesting is the behaviour of the DMI interface. Unlike the PCIe root ports, it doesn't seem to bridge one bus to another bus. The devices that are attached to the PCH are also on CPUBUSNO 0 and share its bus number. In other words, DMI seems to be almost completely transparent.

Thanks for that, I understand what you mean now.

It's hard to say exactly, but I imagine that an MMIO access would look something like this

1. CPU issues memory access instruction. The address is put through virtual to physical translation (it should be marked as uncachable in the page table, the MTRR, or the PAT, so the cache should not be accessed). The instruction is marked as outstanding for the purpose of data dependency and is sent up the chain to the memory controller.

2. The MMU decodes the address range (ACPI tables?) and selects the appropriate system bus to send the MMIO operation to. Memory operations and inter-processor operations get sent to CPUBUSNO 1, while local IO operations get sent to CPUBUSNO 0

3. Devices on the bus examine the address signal to see if it falls within one of their programmed range(s) and respond accordingly. PCI-to-PCI bridges store the upper and lower range of the downstream devices in their BARs. For example, a bridge joining bus 1 (downstream) to bus 0 (upstream) has devices on bus 1 that require an MMIO range of 256MiB. The bridge device on bus 0 would have a contiguous MMIO range of 256MiB to expose the 256MiB carved out by the devices on bus 1. This memory assignment is recursive, so all devices downstream of a point appear as one big contiguous blob.

4. The PCIe root port then packetizes the operation and sends it downstream to the PCIe devices. Each PCIe device then checks it against its own BAR and accepts.

Thanks. Two side questions...

1. Do all Intel CPUs have a CPUBUSNO0 and CPUBUSNO1? I have given them a google and the Xeon datasheet shows up, but when I look at the datasheets for my Haswell CPU there is nothing, all devices appear to be on PCI bus 0 and therefore I presume just be on CPUBUSNO0.

2.If CPUBUSNO1 is on PCI bus 255 (as you said your system is), wouldn't there need to be a logical PCI-PCI bridge on PCI bus 0 to this bus. A single Root Complex/Host Bridge on CPUBUSNO0 would accept all bus ranges and forward them onto bus 0, wouldn't there need to be a bridge onto bus 255 and therefore onto CPUBUSNO1?

I've not worded that well at all but hopefully you'll see what I mean 🙁

Thanks as always,
Robert
 
Thanks. Two side questions...

1. Do all Intel CPUs have a CPUBUSNO0 and CPUBUSNO1? I have given them a google and the Xeon datasheet shows up, but when I look at the datasheets for my Haswell CPU there is nothing, all devices appear to be on PCI bus 0 and therefore I presume just be on CPUBUSNO0.

2.If CPUBUSNO1 is on PCI bus 255 (as you said your system is), wouldn't there need to be a logical PCI-PCI bridge on PCI bus 0 to this bus. A single Root Complex/Host Bridge on CPUBUSNO0 would accept all bus ranges and forward them onto bus 0, wouldn't there need to be a bridge onto bus 255 and therefore onto CPUBUSNO1?

I've not worded that well at all but hopefully you'll see what I mean 🙁

Thanks as always,
Robert

1. I'm not sure to what extent the dual buses are implemented in Intel's architecture. I'm fairly certain that the dual buses are present on all of Intel's LGA-2011 based microprocessors as I used my own microprocessor as a reference. However, the LGA-1156/LGA-1155/LGA-1150 platforms don't have QPI links and have only a single 16x PCIe port so they may cram all of the devices onto a single CPUBUSNO 0 which, owing to the platforms lack of multi-socket support, should always be bus number 0.

2. If both CPUBUSNO 0 and CPUBUSNO 1 are connected directly to the MMU then there doesn't need to be a bridge between them. In other words, the MMU or some other physical uncore component acts as a master for both buses so there's no direct communication between either bus. There may be one, but I haven't seen it in any diagram.
 
1. I'm not sure to what extent the dual buses are implemented in Intel's architecture. I'm fairly certain that the dual buses are present on all of Intel's LGA-2011 based microprocessors as I used my own microprocessor as a reference. However, the LGA-1156/LGA-1155/LGA-1150 platforms don't have QPI links and have only a single 16x PCIe port so they may cram all of the devices onto a single CPUBUSNO 0 which, owing to the platforms lack of multi-socket support, should always be bus number 0.

Thanks, makes sense.

2. If both CPUBUSNO 0 and CPUBUSNO 1 are connected directly to the MMU then there doesn't need to be a bridge between them. In other words, the MMU or some other physical uncore component acts as a master for both buses so there's no direct communication between either bus. There may be one, but I haven't seen it in any diagram.

Interesting, just out of curiosity, do the devices on bus 255 show up in Device Manager and if so, how are they structured if you go to 'View' - 'Devices be Connection'?

For example it's usually 'ACPI x64 Based PC' - 'MS ACPI Compliant System' - 'PCI Bus' - anything under there is then on bus 0, is there some device on there that bridges to bus 255?

Kind Regards,
Robert
 


Yes they do show up in Device Manager.

I have two sections, PCI Bus and PCI Express Root Complex.

The PCI Bus heading contains the uncore parts, including the QPI Links, IMC devices, system performance, etc... just as CPUBUSNO 1 is laid out in the block diagram on Intel's datasheet. All of these devices have bus number 255 (FF)

The PCI Express Root Complex heading contains all of the stuff on CPUBUSNO 0 including the CPU's IIO PCIe root ports, the chipset PCIe root ports, the HPET, etc... All of these devices have bus number 0.

I don't know if there's anything significant in the naming conventions for each bus.
 
Hi Pinhedd,

Interesting, it looks there are 2 Host/PCI Bridges like the following example,


setup.png



Both the and 'PCI bus' and 'PCI Express Root Complex' will have their own entries in the DSDT describing what resources and bus numbers they accept. If you double click on them in Device Manager, then go to details tab and the hardware ID's, are they both either ACPI/PNP0A08 or ACPI/PNP0A03?

Kind Regards,
Robert
 


Yeah that diagram shows what I had in mind.

The one listed as PCI Bus is ACPI\PNP0A03
The one listed as PCI Express Root Complex is ACPI\PNP0A08
 
Solution
Yeah that diagram shows what I had in mind.

The one listed as PCI Bus is ACPI\PNP0A03
The one listed as PCI Express Root Complex is ACPI\PNP0A08

😉 Thanks for all your help with this Pinhedd, it has been greatly appreciated.

Kind Regards,
Robert