News Intel Nova Lake specs leaked — Up to 52 cores and 150W of TDP for Intel's AMD Zen 6 rival

I would love to see the cache size in the product stack.
Core count and clock speed is only part of what makes a good CPU.
Part of the reason Ryzen 7 9800X3D performs so well in games, while Core 5 and 7 do not is their cache size.
 
  • Like
Reactions: JRStern
I would love to see the cache size in the product stack.
Core count and clock speed is only part of what makes a good CPU.
Part of the reason Ryzen 7 9800X3D performs so well in games, while Core 5 and 7 do not is their cache size.
Yeah but it's also how you destroy your sales so intel won't do anything like the x3d.

But if all of the e cores and p cores are still on the same chiplet, then the sheer number of cores will mean a decently higher amount of cache available, just nowhere near x3d.
 
I would love to see the cache size in the product stack.
I'd be surprised if it was much of a divergence from current designs. I think the biggest question would be how the L3 cache functions on the top two SKUs since those are purported to use two Compute Tiles. All of the multi-die Xeon parts have used EMIB, but to my knowledge no consumer parts have utilized it. If ARL is any indication using Foveros for this likely wouldn't be viable for shared L3.

If we assume L3 cache doesn't drop from the last two generations and the Compute Tiles behave as a single unit that would put the top SKU at minimum 72MB.
 
so if i read this right, they're downgrading the E cores to the mobile version of the E core in order to save on power, so they can up the P core count because, you know, they did away with hyperthreading so they need more physical cores to perform, and likely need to figure out how to clock it up without blowing up the ringbus like they did with the 14000 chips.

I guarantee the upped core count is real. because all of intel's marketting will claim something like a 50% boost in multithread performance, all obtained with a 70% boost in core count. this has been intel's MO for years now. up the core count, claim higher performance, while pretending it's an apple to apples comparison when in reality it's an apples to oranges comparison and means nothing.

still if they can give all those core at the same price as their current flagship, that will be a tremendous value to someone out there. it's not like the E cores are trash. and when you pool enough together you get some very good results. i just hope this doesn't come with the burnt out ringbus problem they had before.
 
so if i read this right, they're downgrading the E cores to the mobile version of the E core in order to save on power, so they can up the P core count because, you know, they did away with hyperthreading so they need more physical cores to perform, and likely need to figure out how to clock it up without blowing up the ringbus like they did with the 14000 chips.
You are not reading it right. According to current leaks NVL-S uses 8P/16E Compute Tiles just like ARL-S with the Ultra 7/9 having two Compute Tiles. They are also supposed to have 4 LPE cores in the SoC tile like the mobile parts have had.

Preliminary silicon configs are:
52 (16+32+4)
28 (8+16+4)
16 (4+8+4)

https://nitter.poast.org/jaykihn0/status/1887830497964769515#m
 
  • Like
Reactions: jp7189
Was thinking some more about the configurations and additional details from Jaykihn.

The 9950X (16c/32t) and 285K (8P/16E/24t) are largely the same (ignoring outliers) outside of gaming. Should the scaling of Zen 6 and NVL be similar and AMD moves to 12 core CCDs a dual CCD configuration (24c/48t) wouldn't be able to keep up with the full 2 Compute Tile (16P/32E/48t) NVL configuration.

Another oddity is the PCIe configuration being reported. 32x PCIe 5.0/16x PCIe 4.0 (PCIe 5.0 x4 DMI) seems like a really odd choice if it's the entire stack and not just the Ultra 7/9. ARL has 20x PCIe 5.0/4x PCIe 4.0 (PCIe 4.0 x8 DMI) though it does also have 2 Thunderbolt ports built in which can run PCIe 4.0. Even if some of those reported PCIe 4.0 lanes are being used for connectivity it's significantly more than we've seen on a client part. Assuming there are PCIe 5.0 M.2 slots that means two full x16 slots aren't necessarily available despite 32 lanes.

edit: Jaykihn released the platform information and with PCIe 5.0 lanes being available off the chipset it's unfortunate they didn't maintain 8 lanes the current ones have when upgrading to PCIe 5.0 DMI:
24 PCIe 5.0 lanes
4 PCIe 5.0 DMI lanes
16 PCIe 4.0 chipset lanes
8 PCIe 5.0 chipset lanes

https://nitter.poast.org/jaykihn0/status/1934778035611554165#m
Intel would also be enabling a 4/4/4/4 PCIe split for what I believe is the first time in a client platform.

If the connectivity is just the top two SKUs and the theoretical CPU performance are on point I could almost see this as somewhat of a return of HEDT minus the memory bus. If course this would also likely mean the Ultra 9 would be $800-1000.

If the performance scales like current Intel and AMD CPUs I could easily see the Ultra 9 going for $800-1000.
 
Last edited:
  • Like
Reactions: bit_user
If prices are similar to Arrow Lake's SKU's it will be very interesting fight with Zen 6 IMO. 465K with 14P+24E would be insane for $500. If AMD is using 12 core CCD, 10950X will be 12+12, cores, but will 10900X be say 12+6 with 10800X being 12+0 and 10700X being say 10+0?
 
Last edited:
  • Like
Reactions: PixelAkami
I would love to see the cache size in the product stack.
Core count and clock speed is only part of what makes a good CPU.
Part of the reason Ryzen 7 9800X3D performs so well in games, while Core 5 and 7 do not is their cache size.
Yeah but it's also how you destroy your sales so intel won't do anything like the x3d.
Intel’s Nova Lake “X3D-Like” CPUs Are Now Very Much a Possibility; Could Potentially Feature the 18A-PT Process With Foveros Direct 3D Packaging

This is hopium, but maybe they could make an X3D-like CPU.

If not then it will probably be a disappointment, since cache will be split into two separate chiplets, inaccessible to the other, like AMD's dual-CCD parts.
 
  • Like
Reactions: bit_user and Notton
If Intel can package together essentially 2x 285k at 150watts and maintain decent clocks, this will be a major leap forward.
A major leap forward in nothing useful......
at least for the average user, you need to be doing 3d rendering or something for so many cores to be remotely worth it.

The LPE cores will be worth it more because those at least will mean even lower power draw at idle and low work.
 
edit: Jaykihn released the platform information and with PCIe 5.0 lanes being available off the chipset it's unfortunate they didn't maintain 8 lanes the current ones have when upgrading to PCIe 5.0 DMI:

Intel would also be enabling a 4/4/4/4 PCIe split for what I believe is the first time in a client platform.
Try to think it through: what would be the real advantages vs cost of having 8 PCIe lanes go through the chipset?

At PCIe v5 those four lanes give you 16GByte/s or what x16 used to give you on PCIe v3. That is quite a bit of bandwidth and leaving those four lanes for use by something that attaches to the CPU means you can put more latency sensitive stuff there.

Actually, what I'd really like to see is an architecture where you only get x4 connectors to the CPU and then are able to add as many "chipset expanders" to them as you'd like, including zero for the most compact builds, and also have them come from Intel or ASmedia/AMD as you prefer. No not Broadcomm, unless they start selling at mortal prices again.

And then I'd also like the ability to group those x4 connectors to create x4/x8/x12/x16 aggregates for dGPU or similar as I'd like. Oculink, M.2, some CXL type cable, I don't care as long as it's one standard and it works well across the range of form factors. And the industry better get cracking at sorting that out.

That would let you pick, choose but also reallocate as you see fit and accomodate your changes: it's not poured into concrete or whatever PCBs are made of, as it is today where you often waste precious bandwidth and lanes, because the static allocation of the mainboard can't change with the needs.

It's value would depend a lot on the ability to actually trade PCIe lanes vs. speed, e.g. have a v5 x4 lane support v3 cards at x16 width.

From what I've understood about PCIe, that should be possible, because it's actually packet based, not lane based. Yet all configurations I've ever seen automatically chose the lowest common denominator of version and lanes.

And of course there is the issue of BIOS support: currently the BIOS has all knowledge about on-board switches aka chipsets built in, but once you make that expandable/flexible, responsibilities may have to be split and reallocated, operating systems might have less of an issue, as long as they can boot.
 
Try to think it through: what would be the real advantages vs cost of having 8 PCIe lanes go through the chipset?
Pretty simple: not throttling performance when you put a PCIe 5.0 SSD off the chipset then use it to transfer something. If they weren't adding PCIe 5.0 to the chipset then the x4 link would just be disappointing instead of a potential problem.
At PCIe v5 those four lanes give you 16GByte/s or what x16 used to give you on PCIe v3. That is quite a bit of bandwidth and leaving those four lanes for use by something that attaches to the CPU means you can put more latency sensitive stuff there.
It's the exact same amount of bandwidth as they currently have. NVL isn't increasing the number of lanes available from the CPU either so it's not like anything is being gained from the x4 link.
 
For me the main Nova-Lake question is if Intel can actually manage to make the modularity of their Foveros et al chiplset designs work with sufficent economic benefits to be competitive with AMD's CCD and APU offerings.

Obviously the previous approach of gelding functioning dies to cover the full range wouldn't work for the true middle and low-range any more, so they'd really want the ability to start small and assemble the variants they want at pretty near linear cost per compute power.

But with the mix of IDF vs TMSC tiles, assembly and base die cost etc. linearity may be hard to get and then there is all the hassle that comes from extra interconnect latency and power: that purported flexbibility is anything but cheap!

And even if they could in theory create per customer on-demand chips, they still need to produce and sell all potential SKUs at scale to make it worthwhile and competitive vs. AMD.

Somehow I just can't believe they can pull that off and they are following pipe dreams that look good on paper but won't get customer buy-in. Not with that competition...
 
Pretty simple: not throttling performance when you put a PCIe 5.0 SSD off the chipset then use it to transfer something. If they weren't adding PCIe 5.0 to the chipset then the x4 link would just be disappointing instead of a potential problem.
Exclusive bandwidth allocation is very expensive, that why switches/chipsets compromise. You need it if you're doing real-time, but otherwise your PCIe v5 SSD will still be pretty ok vs. anything older when even 10Gbit Ethernet only ever grabs 1 of 16 Gbytes/s. There isn't that much competition in most real-time cases, I've had Kaby Lake systems with 10Gbit and SATA SSD-RAIDs still work okayish with a puny DMI link. Sure the big Xeon next to it had more bandwidth to offer, but it also dug bigger holes into my finances.
It's the exact same amount of bandwidth as they currently have. NVL isn't increasing the number of lanes available from the CPU either so it's not like anything is being gained from the x4 link.
I couldn't find any lane counts to see if they had mostly shifted them. But PCIe lanes come at a really high cost of power and chip surface area, because they can't really shrink when external power levels must be maintained.

So if 4 fewer PCIe lanes allow doubling core counts (exaggerating, I know), I can see why they might have gone that way.

AMD lags on chipsets with the current ASmedia designs, so it favors more lanes from the CPU.

If Intel believes it's a USP to go the other direction, then that widends customer choice.

Yeah, I always like having the best of both, too, but for that you may need to go workstation.

As I mentioned already, I'd really like to have much of the flexibility that is actually possible, since "chipsets" are really rather generic switches mostly (ok, only some speak Inifinity Fabric) and thus shouldn't be tied to a vendor or his market segmentation ambitions.
 
So if 4 fewer PCIe lanes allow doubling core counts (exaggerating, I know), I can see why they might have gone that way.
It won't allow for anything of the sort because PCIe 5.0 on N6 is slightly larger than a single N3B Skymont E-core:
https://www.techpowerup.com/336412/inside-arrow-lake-intels-die-exposed-and-annotated

edit: I'm betting the real reason is use of SoC tile across desktop and mobile (this is just a guess based on the fact that LPE cores are listed for NVL-S SKUs)
Exclusive bandwidth allocation is very expensive, that why switches/chipsets compromise. You need it if you're doing real-time, but otherwise your PCIe v5 SSD will still be pretty ok vs. anything older when even 10Gbit Ethernet only ever grabs 1 of 16 Gbytes/s. There isn't that much competition in most real-time cases, I've had Kaby Lake systems with 10Gbit and SATA SSD-RAIDs still work okayish with a puny DMI link. Sure the big Xeon next to it had more bandwidth to offer, but it also dug bigger holes into my finances.
The problem is one that AMD already has which is why their chipsets don't have the connectivity Intel's do and high speed SSDs are more problematic. It's not really a good idea to have connectivity that allows for a single device to completely saturate the DMI connection.

When AMD shifted to PCIe 4.0 x4 Intel matched the bandwidth with PCIe 3.0 x8, but since the platform was still PCIe 3.0 a single SSD cannot saturate the link. Intel carried that forward when moving to a PCIe 4.0 chipset platform, but now they're cutting the link in half when moving to PCIe 5.0. It's hard to see that as anything other than moving backwards. The only fortunate part is that it's unlikely to cause problems very often.
 
Last edited:
edit: I'm betting the real reason is use of SoC tile across desktop and mobile (this is just a guess based on the fact that LPE cores are listed for NVL-S SKUs)
That sounds very reasonable, a minimum number of common parts to create a big range is how AMD got ahead.
The problem is one that AMD already has which is why their chipsets don't have the connectivity Intel's do and high speed SSDs are more problematic. It's not really a good idea to have connectivity that allows for a single device to completely saturate the DMI connection.

When AMD shifted to PCIe 4.0 x4 Intel matched the bandwidth with PCIe 3.0 x8, but since the platform was still PCIe 3.0 a single SSD cannot saturate the link. Intel carried that forward when moving to a PCIe 4.0 chipset platform, but now they're cutting the link in half when moving to PCIe 5.0. It's hard to see that as anything other than moving backwards. The only fortunate part is that it's unlikely to cause problems very often.
Starvation is never a good thing. But even the original PCI bus had learned that from ISA days and made sure that hardware arbitration prohibited true monopolisation: you only got a limited number of cycles as bus master before another arbitration was forced with round-robin allocation.

AFAIK PCIe inhertited that logic and should be just as resilient here and the ability to oversubscription is the raison d'etre for switch chips: the ratio is critical, though and perhaps Intel is overdoing it a bit with NVL. Still, it could match your use case.

Intel would say that they offer iso-bandwidth, so it's not a full degression and that there is too little of an incentive, too much of a cost to double that bandwidth.

Where I can see a degression is when the lowest common denominator of lane speed and count is always chosen, so that an PCIe v3 x8 peripheral which would get near 8Gbyte/s of bandwidth with an older chipset now only gets four v3 lanes on a bus that's capable of v5 speeds: PCIe devices and switches contain buffers, they should be able to translate, but I don't see that happen and don't know if it's just 'lazy' configuration or I misunderstand PCIe capabilities.

If Intel's new chipset really operated by matching bandwidths with variable lanes, that would be a strong selling point, e.g. allowing a v3 x16 GPU to operate at full speed even with the v5 x4 uplink.

For the longest time the bandwidth increases from modern SSDs outstripped anything applications and even game designers imagined, it's only now that storage is increasingly seen as a GPU direct data delivery agent, which needs to conform to quality of service parameters in terms of latency and bandwidth to avoid game stutters.

But I see that mostly driven from consoles and thus a generation or so behind what the PC leading edge can provide even with somebody else also using some fat big port.

We'll see, I guess, but so far I see the risk that NVL will be far too expensive for a long time for me to even worry about it.

When I can get 16 Zen 4 cores (Ryzen 7945HX) with 24 PCI v5 lanes (no chipset) including a mainboard for €450 from Minisforum, I'm simply not looking at Intel's Nova Lake i9 at probably twice that or more.

I only count P-cores and consider E-cores a crutch for inferior power management, which makes the biggest Nova-Lake yet another 16 core in my eyes (yes, I know it's not quite true, but since I run Proxmox on many systems, E-cores are just a complication).

First thing I did was add a bifurcation adapter, to split x16 into x8+x4+x4 and that was perhaps €20 to recycle 10Gbase-T network, add 6x SATA and yet another NVMe drive.

There is no oversubcription in that build, because it's mostly a µ-server without a dGPU. But it's also the reason I'd rather like to do away with x16 slots and have everything use a variable count lf small x4 connectors, potentially with a switch for fan-out ports.
 
Where I can see a degression is when the lowest common denominator of lane speed and count is always chosen, so that an PCIe v3 x8 peripheral which would get near 8Gbyte/s of bandwidth with an older chipset now only gets four v3 lanes on a bus that's capable of v5 speeds: PCIe devices and switches contain buffers, they should be able to translate, but I don't see that happen and don't know if it's just 'lazy' configuration or I misunderstand PCIe capabilities.
It would certainly be possible to do this, but it would require running 16 traces per slot from the chipset and a chipset design which would detect PCIe revision and lane count then adjust lane count accordingly. The additional cost in chipset silicon alone would probably be massive even if the additional motherboard manufacturing cost wasn't a deal breaker.

The alternative would be at the board level using PCIe switches, but this would cost even more as I'm not aware of any PCIe 5.0/4.0 switches that aren't Broadcom/Microchip.
 
It would certainly be possible to do this, but it would require running 16 traces per slot from the chipset and a chipset design which would detect PCIe revision and lane count then adjust lane count accordingly. The additional cost in chipset silicon alone would probably be massive even if the additional motherboard manufacturing cost wasn't a deal breaker.

The alternative would be at the board level using PCIe switches, but this would cost even more as I'm not aware of any PCIe 5.0/4.0 switches that aren't Broadcom/Microchip.
Chipsets are for all intents and purposes PCIe switches, except that they are also SATA/USB and sometimes also Ethernet, Wifi and various other controllers, too.

In case of AMD, their chipsets/switches also speak Inifinity Fabric, especially between the IOD and the 'actual chipset', if there is one, since it's optional on Zen (the Minisforum BD790i doesn't have one, since it uses mobile hardware). You can also observe the basic flexibility in how two chipsets (or multi-protocol switches) are actually cascaded on some bigger Zen mainboards.

And compared to pure PCIe switches as those originally made by the likes of PLX, which are now part of Broadcomm, they are even more feature rich and relatively cheap. In the case of non-APU Zens, the IOD is a PCIe switch, but also adds RAM, IF, as well as some USB, SATA protocols and ports and might even contain a GPU.

So you could actually get a PCIe switch on the cheap via the IOD on say an entry level discounted 5600 much cheaper than any other way, except that there was no way to make that work on a normal mainboard.

PCIe revision/speed, lane count and feature detection is part of the PCIe protocol and supported by everything that connects to a PCIe bus, CPUs, switches, and controllers. Discovery and reconfiguration can be done at any time, initiated and under the control of the root device, usually the mainboard on a PC. Lane speed is also constantly adjusted as part of power management, any higher PCIe revision needs to support all the lower ones, even if they use different data signal encodings and they need to manage mismatches in lane count between slots and devices, e.g. x4 card in x16 slot or vice versa.

All that is there and works just fine, usually. There is really just the lowest common denominator issue when it comes to manage those counts and speeds. And it's physical and probably logical, too.

Physical, because I can't remember ever seeing x8 or x16 physical slots south of a 4 lane chipset/DMI connection, just one x4 being switched among various other x4 slots/devices, or smaller ones. Originally that would have made no sense at all, but with PCIe v5 a single lane could provide 16 lanes of PCIe v1 bandwidth, so supporting wider lane counts at lower speeds would make some sense in some situations. If it's enough of a benefit to spend silicon real-estate and validation effort is another matter, but it's a niche nobody seems to fill today.

Logically would mean that if there was say an x8 or x16 electrical slot south of the chipset/DMI, that it would actually negotiate x8/x16 v3 with a v3/v2 capable device and provide full bandwidth via the v5 DMI or uplink through its buffers, which switches, but also PCIe devices have. Their number and size is part of the negotiation and can be read from tools like HWinfo.
 
Last edited:
PCIe revision/speed, lane count and feature detection is part of the PCIe protocol and supported by everything that connects to a PCIe bus, CPUs, switches, and controllers.
The problem with this is that if there were 16 lanes hanging off the chipset and you plugged in something which was PCIe 5.0 x16 it would allocate 16 PCIe 5.0 lanes. That's how PCIe works and that would be a massive problem with your theory. That's why there would need to be functionality to limit PCIe lane count based on revision or a separate downstream switch. Unless of course the slot was hard limited to a lower PCIe revision which would seemingly defeat the purpose.
 
The problem with this is that if there were 16 lanes hanging off the chipset and you plugged in something which was PCIe 5.0 x16 it would allocate 16 PCIe 5.0 lanes. That's how PCIe works and that would be a massive problem with your theory. That's why there would need to be functionality to limit PCIe lane count based on revision or a separate downstream switch. Unless of course the slot was hard limited to a lower PCIe revision which would seemingly defeat the purpose.
There wouldn't be an operational problem with that, just not the performance you'd want and perhaps an economic waste.

Not sure if you believe that PCIe lanes are finite resources that get expended on a mainboard. So that if a CPU has 24 lanes, all connected devices have to share that.

It's true in terms of max bandwidth, but not in allocations and switches/chipsets make the difference, because PCIe is point-to-point and every point gets to play with it its lanes (well the root complex does and physical lane connections are tied down).

Next the devices don't get the say in the allocation process. They state their capabilities and the root complex (or rather the firmware and OS on it) makes the decisions as to what a device actually receives in allocations. What allocation a hypothetical v5 x16 device would get on a slot with x16 lanes connected behind a v5 x4 DMI is up to that firmware. In terms of bandwidth v3+x16, v4+x8 or v5+x4 would be equivalent and more bandwidth can't be obtained. It might actually have to deal with only x1+v1 if that's what the firmware decides.

The x16 v5 device could actually get a full x16 v5 connection configured to the chipset (as well as all the other premutations), and that wouldn't change anything functionally. It can't oversaturate or monopolize, even if it were to run v5 transmission on 16 lanes, it can't do more than fill the switches buffers and then wait for another turn.

That allocation just is not likely to happen if only for power management reasons as higher PCIe revisions generally imply higher power, how lanes vs speeds would fare is difficult to know without hardware to check, currently the lowest common denominator is chosen by default from what I observe.

Again, even if then the x16 v5 device were to run that to the chipset, it wouldn't change arbitation, cause starvation or general performance very much: there is no way any conformant PCIe device can simply monopolize the fabric. Overall bandwith remains restricted to the bottleneck, but round-robin arbitration and cycle limits on the bus ensure starvation doesn't happen (doesn't preclude user disappointments or perhaps network retransmissions).

And of course, additional PCIe switches could be inserted between the device and the chipset as well, complicating things further.

Outside PCs these complex PCIe fabrics are far more common, mostly in high-end SANs. If it had not been for Broadcomm jacking up the prices so much, we'd probably see a lot more variety even in the PC or workstation space today.

There is some come-back in NVMe storage e.g. with cards from HighPoint-Tech, but they sell at four digits.