News 120 Gbps Thunderbolt 5 and more PCIe 5.0 lanes coming to Intel's Arrow Lake desktop CPUs, Barlow Ridge controller debuts

Admin · Jan 23, 2024

A leak of internal Intel documents details that Arrow Lake-S desktop CPUs will support Thunderbolt 5.

120 Gbps Thunderbolt 5 and more PCIe 5.0 lanes coming to Intel's Arrow Lake desktop CPUs, Barlow Ridge controller debuts : Read more

Ogotai · Jan 23, 2024

yep.. still not enough..
when you add hardware, and lanes are shared, or something else looses lanes, like the gpu... there are issues. to use the hardware better, there needs to be enough lanes for every thing

suryasans · Jan 23, 2024

In the PC market, Few devices that can take this massive bandwidth, among them are nVME adapter to thunderbolt 5 and external dGPU enclosure to thunderbolt 5.

Notton · Jan 23, 2024

Ogotai said:
yep.. still not enough..
when you add hardware, and lanes are shared, or something else looses lanes, like the gpu... there are issues. to use the hardware better, there needs to be enough lanes for every thing

get a HEDT then?
16x 5.0 lanes is overkill for current GPUs. These GPUs take up 3 to 4 slots when desktop cases only feature up to 7 slots anyways.
Like what are you trying to run? 7x Apex storage X21's?

cyrusfox · Jan 23, 2024

Ogotai said:
yep.. still not enough..
when you add hardware, and lanes are shared, or something else looses lanes, like the gpu... there are issues. to use the hardware better, there needs to be enough lanes for every thing

For the desktop space, I see limited utility and agree with your sentiment, but it is 2x to 3x what TB3/4 achieved, and for mobile, its a great leap forward for bandwidth and improved applications of this technology (Docks, eGPU, Storage/networking). It helps remove proprietary standards like those currently in use (e.g. ROG xg mobile connector which has 8x lanes pcie gen 4 so 128gbps compared to the 120gbps TB5 can offer).

USB-C spec is going to get more and more complicated

Ogotai · Jan 23, 2024

Notton said:
get a HEDT then?
16x 5.0 lanes is overkill for current GPUs. These GPUs take up 3 to 4 slots when desktop cases only feature up to 7 slots anyways.
Like what are you trying to run? 7x Apex storage X21's?

ive looked, thread ripper, too expensive.
you are forgetting mobo makers are adding more and more m.2 slots, usb ports. while you may not need them, for my main comp, gpu, sound card, sata drives, and a raid card, pice lanes quickly run out...

cyrus, wasnt talking about TB, i was referring to the " more pcie lanes " part.

ThomasKinsley · Jan 24, 2024

TB is a great standard and should become a mainstream solution. I don't know if it's industry fees that's holding it back, but it's a shame that it's taking so long.

abufrejoval · Jan 24, 2024

"Desktop" with 20 PCIe lanes?

That's no desktop, but basically a high-end mobile part being shoved into that range.

That might make sense to Intel, because it either allows them to drop a die variant or sell the older desktop chips as HEDT. Because AFIAK those are 28 lane devices on 12-14th Gen Intel and Zen 3-4 AMD.

On desktops GPUs don't need to share any of their 16 lanes with an SSD, because they have 4 extra lanes just for that.

On mobile parts and APUs it's pretty much as you describe. And while those make for great desktops, too they aren't in the same class.

So is this just a simple mistake or a sign of Intel trying to devaluate the good old PC middle class into using entry level hardware?

No amount of TB5 oversubscribing a PCIe v4 DMI uplink would wash away that marketing stunt, which Intel has previously pulled with PCIe v3 slots below a DMI link at v2 speeds...

TJ Hooker · Jan 24, 2024

abufrejoval said:
"Desktop" with 20 PCIe lanes?

That's no desktop, but basically a high-end mobile part being shoved into that range.

That might make sense to Intel, because it either allows them to drop a die variant or sell the older desktop chips as HEDT. Because AFIAK those are 28 lane devices on 12-14th Gen Intel and Zen 3-4 AMD.

The 20 lanes mentioned in this article are just those directly connected to the CPU, not including those provided through the chipset. Intel has never supported more than 20 direct CPU lanes on a mainstream platform, and AMD tops out at 24 (since AM4 anyway, don't remember what it was like pre-Zen). Unless you count those that are reserved for special purposes like connecting to the chipset, in which case Arrow Lake S would have 32 based on what's shown in this article.

bit_user · Jan 24, 2024

Hold my beer...

Barlow Ridge is equipped with four PCIe 4.0 lanes, giving it 80Gbps worth of bandwidth in both directions, twice that of both Thunderbolt 3 and Thunderbolt 4. Additionally, Barlow Ridge can do 120Gbps in one direction and 40Gbps in another for special circumstances where that would improve performance.

Um... so PCIe 4.0 x4 has a nominal throughput of 7.88 GB/s, in each direction. That translates to just 63.04 Gbps. You don't get 80 Gbps (uni) out of that, and for darn sure don't get 120!

It's not completely clear if Barlow Ridge will be integrated into Arrow Lake CPUs themselves, or will be featured on certain motherboards.

I think the fact that it has its own codename and is connected via PCIe pretty clearly tells us it's a discrete part.

cyrusfox said:
It helps remove proprietary standards like those currently in use (e.g. ROG xg mobile connector which has 8x lanes pcie gen 4 so 128gbps compared to the 120gbps TB5 can offer).

I hope you don't need all of those Gbps, then.

** takes a sip **

bit_user · Jan 24, 2024

Ogotai said:
ive looked, thread ripper, too expensive.

Mainstream sockets have enough bandwidth. The solution to the lane-count problem could be just a handful of motherboards with a mainstream socket and a PCIe switch attached to the PCIe 5.0 x16 link. Its 63 GB/s of bandwidth provides plenty to go around, since you'd never have all devices trying to use 100% of their share at the exact same time.

bit_user · Jan 24, 2024

TJ Hooker said:
Unless you count those that are reserved for special purposes like connecting to the chipset, in which case Arrow Lake S would have 32 based on what's shown in this article.

Yes, DMI Gen4 is basically 8x PCIe 4.0 lanes. They went to x8 in LGA 1200 (Comet Lake) and boosted the speed to PCIe 4.0 (Rocket Lake). (Edit: so x32 lanes, if we're counting the DMI lanes and disregard the MTP-S secondary lanes).

AMD's AM5 platform has x28 CPU lanes, if you count the PCIe 4.0 x4 chipset link. (Edit: thanks for the correction)

...come to think of it, it's perhaps a little strange Intel didn't just build the TB5 link straight into the chipset. Unlike Barlow Ridge, it actually has enough connectivity to the CPU.

JayNor · Jan 24, 2024

I believe Intel's current consumer ARC GPUs have PCIE4 interfaces. Does this suggest we'll learn that Battlemage will use pcie5?

bit_user · Jan 24, 2024

JayNor said:
I believe Intel's current consumer ARC GPUs have PCIE4 interfaces. Does this suggest we'll learn that Battlemage will use pcie5?

Won't matter. Except for some corner-case compute workloads, PCIe 5.0 x16 shouldn't make much difference on Battlemage. It'll be like how the 1st gen RDNA cards supported PCIe 4.0. Even though they had it, they weren't fast enough to really benefit from it.

...unless we start seeing good support for dual-GPU setups returning, in which case running 2x GPUs each at PCIe 5.0 x8 could be a nice option.

TJ Hooker · Jan 24, 2024

bit_user said:
No, x36. DMI Gen4 is basically 8x PCIe 4.0 lanes. They went to x8 in LGA 1200 (Comet Lake) and boosted the speed to PCIe 4.0 (Rocket Lake).

AMD's AM5 platform has x32 CPU lanes, if you count the PCIe 4.0 x4 chipset link.

...come to think of it, it's perhaps a little strange Intel didn't just build the TB5 link straight into the chipset. Unlike Barlow Ridge, it actually has enough connectivity to the CPU.

I'm not seeing those last 4 lanes for ARL. As per the figure in this article, there's the main x16 link, x4 for the Barlow Ridge connection, x4 for NVMe, and x8 for DMI, for 32 total. The figure does have another x4 connection but that's from what appears to be the PCH (labelled "MTP-S", although I don't know what that stands for). Which, as an aside, is a bit odd because having only 4 extra lanes provided by the chipset would be a regression.

And AM5 only has 28 including chipset link. The x16 link, two x4 NVMe links, and an x4 chipset link.

https://www.anandtech.com/show/1758...-ryzen-5-7600x-review-retaking-the-high-end/3

thestryker · Jan 24, 2024

bit_user said:
Um... so PCIe 4.0 x4 has a nominal throughput of 7.88 GB/s, in each direction. That translates to just 63.04 Gbps. You don't get 80 Gbps (uni) out of that, and for darn sure don't get 120!

If you're not aware you do get 80/120 Gbps out of it, just not for PCIe host which I find to be a dumb choice (I'm guessing it was made for power consumption in laptops reasons).

Here's the release information: https://www.anandtech.com/show/20050/intel-unveils-barlow-ridge-thunderbolt-5-controllers

bit_user · Jan 24, 2024

TJ Hooker said:
I'm not seeing those last 4 lanes for ARL. As per the figure in this article, there's the main x16 link, x4 for the Barlow Ridge connection, x4 for NVMe, and x8 for DMI, for 32 total. The figure does have another x4 connection but that's from what appears to be the PCH (labelled "MTP-S", although I don't know what that stands for).

Agreed. We shouldn't count that link from the MTP-S block, since it effectively comes out of the DMI 4 x8 budget.

TJ Hooker said:
Which, as an aside, is a bit odd because having only 4 extra lanes provided by the chipset would be a regression.

Yes, let's hope there are more chipset lanes than shown.

TJ Hooker said:
And AM5 only has 28 including chipset link. The x16 link, two x4 NVMe links, and an x4 chipset link.

Yes, my brain must've been in power-saving mode, when I wrote that.

bit_user · Jan 24, 2024

thestryker said:
If you're not aware you do get 80/120 Gbps out of it, just not for PCIe host which I find to be a dumb choice (I'm guessing it was made for power consumption in laptops reasons).

Okay, so the DisplayPort data reaches the chip by a different path than PCIe? I interpreted that fat, grey arrow as a data flow, rather than a physical connection.

So, how will this work with a "dGFX AIC" and what would even be the point of having a cable internal to the PC, just so you could use the TB5 connector on the motherboard?

thestryker · Jan 24, 2024

bit_user said:
Okay, so the DisplayPort data reaches the chip by a different path than PCIe? I interpreted that fat, grey arrow as a data flow, rather than a physical connection.

Yeah TB is a serialized connection which carries PCIe over it which is why you don't get maximum PCIe bandwidth over TB as some bandwidth is always reserved (most of what I've seen says 22-25Gbps real world PCIe for TB3/4). In theory there should be more of the theoretical PCIe bandwidth available with TB5 than TB3/4, but it will take until we see implementation to know how much better.

bit_user said:
what would even be the point of having a cable internal to the PC, just so you could use the TB5 connector on the motherboard?

From what I understand of the desktop implementations (they've all been messy so I've largely stayed away from trying to figure it out) it's supposed to be necessary even when using TB cards. I don't think Intel ever really made a solid standard for it so sometimes you need them sometimes you don't. I've even seen just shorting pins to make the motherboard think a cable is plugged in then a card works. It's honestly just a giant stupid mess that Intel could probably fix entirely if they cared to.

abufrejoval · Jan 24, 2024

TJ Hooker said:
The 20 lanes mentioned in this article are just those directly connected to the CPU, not including those provided through the chipset. Intel has never supported more than 20 direct CPU lanes on a mainstream platform, and AMD tops out at 24 (since AM4 anyway, don't remember what it was like pre-Zen). Unless you count those that are reserved for special purposes like connecting to the chipset, in which case Arrow Lake S would have 32 based on what's shown in this article.

And so I would have thought. But he says in the article:

When a user installs both a GPU and a PCIe 5.0 SSD into a 12th, 13th, or 14th Gen PC, it forces those two parts to share those lanes. The SSD will get the full four lanes it needs, while the GPU gets just eight of the 12 remaining lanes, with four going unused (or for another PCIe 5.0 SSD).

Now, he could have meant a secondary NVMe or just confused desktop and "APU class", but the way it's written, I'd say it's wrong for desktop SoCs (20+8 on Intel, basically 7*4 on AMD Infinity Fabric).

Of course, he could have thought about plugging that PCIe v5 NVMe drive into something "slotish" that is fed from the 16 PCIe v5 GPU lanes, instead of the usual NVMe slot that is just fed with PCIe v4 on these SoCs, because it's after all about a v5 speed advantage.

All in all I think there may be a paragraph or two missing to explain what he really meant but the quote above without such context or correction is wrong.

But this is perhaps a good opportunity to ask a question for which I haven't been able to find an answer:

Does PCIe allow you to effectively "tunnel" say 8 PCIe v3 lanes at 4x v4 (and perhaps another 2x v5) via set of switch chips?

In other words, are PCIe switches physically capable to [packet] switch between lane and speed allocations with the same effective bandwidth, or are they effectively only switching lanes?

With such few lanes at ever higher bandwidths the ability to trade lanes and generations via switches could be interesting.

From what I could read in the standards book, PCIe is really packet switching with full buffers and the lowest level speed is negotiated point-to-point. So it should work.

But of course no plain desktop BIOS would be able to initialized sich a complex PCIe switch fabric and so far I've only ever seen the lowest common denominator of lanes and speeds being negotiated and used, making it essentially more of a lane switching architecture, even if one level of oversubscription or aggregation is typically supported, but only at the same speeds (e.g. 4x2 (or 8x1) input v3 input lanes output at 1x4 v3.

abufrejoval · Jan 24, 2024

Notton said:
get a HEDT then?
16x 5.0 lanes is overkill for current GPUs. These GPUs take up 3 to 4 slots when desktop cases only feature up to 7 slots anyways.
Like what are you trying to run? 7x Apex storage X21's?

Well, some people use GPUs for GPGPU workloads like machine learning and then find that 16 lanes of PCIe v5 between two such GPUs is slightly cheaper and more easily available than NV-link, while few models tolerate such drastic bandwidth reductions between different layers of the LLM, when they feel 3TB/s of HBM or 1TB/s of GDDR6 is slow already.

Just saying that GPUs aren't just about gaming, and today fixed slot allocations to lanes lack flexibility and result in unusable capacity. I'd prefer being able to freely allocate sets of 4-lane bundles between components, much like CXL seems to envisage.

Cables capable of running PCIe v5 speeds will be terribly expensive and connectors aren't as reliable as solder, but they have a good chance of catching up with precise run lenghts vs PCB traces and at these speeds: 1mm of PCB may be the distance electrons run between clocks for all I know.

BTW I was completely shocked to discover
a) the RTX2060m in my Enthusiast NUC11 was only using 4 lanes to the TigerLake 1165G7 (U-class mobiles just don't have more than 8 lanes for a dGPU and this one needs 4 for Thunderbolt support).
b) It didn't matter a bit for gaming performance, which was still extremly good, especially for a system that cost almost exactly the same as another NUC11 without that dGPU, when I bought it (it was not attractive at its original price).
c) that was all PCIe v3, because the RTX20 series can't do better, while the Tiger Lake could.
But that's basically my kind of Xbox, not a workstation.

Nvidia hates people combining GPUs to tackle bigger workloads, so they have OEMs make cards too wide to use bifurcation.

Hackers hate Nvidia putting obstacles like this on their path to get the best value for their money, so they swap out shrouds, convert GPUs to liquid cooling or just use cables and do it anyway. Plenty of RTX4090 being hacked like that in China and I would have wanted two like that to run Llama-2 70B at 4 bit quantization in my lab.

I had to do with PNYs a 3 slot 4090 and a 2 slot 4070 which are just that crucial bit smaller to have them fit into an x570 board without killing warranty, to test how bad LLMs would suffer from a PCIe v4 x8 bottleneck between them at the different layers for distinct model variants.

There are dramatically diminishing returns, just in case you're interested, but prices for those A100 are exponential and that means there may be an intersect of interest.

TJ Hooker · Jan 25, 2024

abufrejoval said:
Now, he could have meant a secondary NVMe or just confused desktop and "APU class", but the way it's written, I'd say it's wrong for desktop SoCs (20+8 on Intel, basically 7*4 on AMD Infinity Fabric).

Of course, he could have thought about plugging that PCIe v5 NVMe drive into something "slotish" that is fed from the 16 PCIe v5 GPU lanes, instead of the usual NVMe slot that is just fed with PCIe v4 on these SoCs, because it's after all about a v5 speed advantage.

All in all I think there may be a paragraph or two missing to explain what he really meant but the quote above without such context or correction is wrong.

The bit of article you quoted says 12th, 13th, or 14th gen CPUs (i.e. the current gen and the previous two generations), which all have only 16 PCIe gen5 lanes. So any motherboard that has a gen5 M.2 slot (and/or additional gen5 PCIe add-in-card slot) would have to implement it such that it shares bandwidth with the primary gen5 x16 slot (i.e. the slot you'd have your graphics card in). As such, if you want to use a gen5 SSD, the GPU would have to operate with a x8 link (because the x16 only supports bifurcation into x8/x8). You can see an example of such a motherboard here: https://rog.asus.com/motherboards/rog-maximus/rog-maximus-z790-formula/spec/

The next gen of desktop CPUs (the main topic of this article) is purportedly adding an additional 4 lanes of gen5, which would allow you to run a gen5 SSD and still have your GPU at x16 gen5.

I don't think there's any information missing from the article with respect to the above. Yes, you could plug the SSD into a gen4 M.2 slot and not lose graphics card bandwidth, but the author clearly states that he's discussing the scenario where someone wants to run an SSD at gen5 speeds.

abufrejoval said:
Does PCIe allow you to effectively "tunnel" say 8 PCIe v3 lanes at 4x v4 (and perhaps another 2x v5) via set of switch chips?

In other words, are PCIe switches physically capable to [packet] switch between lane and speed allocations with the same effective bandwidth, or are they effectively only switching lanes?

With such few lanes at ever higher bandwidths the ability to trade lanes and generations via switches could be interesting.

From what I could read in the standards book, PCIe is really packet switching with full buffers and the lowest level speed is negotiated point-to-point. So it should work.

But of course no plain desktop BIOS would be able to initialized sich a complex PCIe switch fabric and so far I've only ever seen the lowest common denominator of lanes and speeds being negotiated and used, making it essentially more of a lane switching architecture, even if one level of oversubscription or aggregation is typically supported, but only at the same speeds (e.g. 4x2 (or 8x1) input v3 input lanes output at 1x4 v3.

Yes, my understanding matches yours, such that a x4 gen4 link could service x8 gen3 lanes at full bandwidth via a PCIe switch. The chipset would provide such a switch. E.g. the Z790 chipset provides up to 8 gen3 and 10 gen4, with a x8 gen4 link to the CPU. So, for example, it could provide full bandwidth to all gen3 lanes and 4 gen4 lanes simultaneously.

TJ Hooker · Jan 25, 2024

abufrejoval said:
Well, some people use GPUs for GPGPU workloads like machine learning and then find that 16 lanes of PCIe v5 between two such GPUs is slightly cheaper and more easily available than NV-link, while few models tolerate such drastic bandwidth reductions between different layers of the LLM, when they feel 3TB/s of HBM or 1TB/s of GDDR6 is slow already.

Who is out there trying to run $60,000+ worth of GPUs (i.e. two or more H100s, the only Nvidia GPUs that support PCIe 5.0) on mainstream consumer motherboards? And then crippling themselves further by not using NVLink (reducing bandwidth between cards by an order of magnitude, 63 vs 900 GB/s)? I don't believe what you're describing is a real use case.

abufrejoval · Jan 27, 2024

TJ Hooker said:
Who is out there trying to run $60,000+ worth of GPUs (i.e. two or more H100s, the only Nvidia GPUs that support PCIe 5.0) on mainstream consumer motherboards? And then crippling themselves further by not using NVLink (reducing bandwidth between cards by an order of magnitude, 63 vs 900 GB/s)? I don't believe what you're describing is a real use case.

At the scale of the PC industry, even niches can be quite large. And the biggest advantage of the PC industry used to be how much you can hack yourself, if nobody wants to sell you a custom fit solution.

Customers want to retain that flexbility as much as possible, vendors are ever more tempted to segment market and squeeze them for their bottom line.

H100 basically do not exist for smaller labs, even if they had the budgets, capacity is sold out to the giants. So often it's just RTX 4090s today, because they need to offer the ability to do real-life experiments to their clientele. Perhaps something AMD or even Intel will come along and the software to make that possible. And those might then want to piss into Nvidias pot by offering PCIe v5 on 'consumer hardware', that won't need it for gaming.

And yeah, ist sucks that the RTX 4090 won't support PCIe v5, but I can see why Nvidia wouldn't even want that: there is a reason they eliminated NVlink on 'consumer cards' or make them close to 5 slots wide. They want the consumer market for scale, but they want to keep the margin on the GPGPU stuff.

But buyers have every right to try engineering around vendor shenanigans. And that is the reason why these cards are being rebuilt in to a 1-slot form factor and not just in in China, where pressure is not only financial, but also political.

The challenge is then to find the layers at which you can cut LLMs into parts, so performance doesn't drop too much on something like a Llama-2 70B. But when you're only choice is to sit down and to nothing or trying to make it work with slow PCIe, some refuse to just sit down.

It may not be very attractive to run these Frankenstein setups in production, but in research the ability to continue is often more important.

News 120 Gbps Thunderbolt 5 and more PCIe 5.0 lanes coming to Intel's Arrow Lake desktop CPUs, Barlow Ridge controller debuts

Administrator

Reputable

Distinguished

Estimable

Distinguished

Reputable

Notable

Honorable

Titan

Titan

Titan

Titan

Honorable

Titan

Titan

Judicious

Titan

Titan

Judicious

Honorable

Honorable

Titan

Titan

Honorable

Share this page