News Intel Document Confirms New LGA1700 Socket For Alder Lake Processors

I read somewhere that the new chip will be rectangular--whereas the substrate is somehting like 37x37mm now, the LGA1700 will be 47x37mm, if I remember correctly.
 
There's no way in hell the new Socket will be the same dimensions as LGA 1200.

It has to increase in size given the number of new pins.

That means you'll probably need a new spec for coolers to mount to along with more surface area for the heat spreader.

That means all old LGA 1200/115x coolers will be obsolete.
 
It feels like a bad nightmare. Intel's LGA1200 socket for Comet Lake processors isn't even a year old yet, and there are already talks of a new socket for next year's 10nm processor.
A year? Didn't it just launch a couple months ago?

Still, there's nothing new. After Comet Lake comes Rocket Lake. And that's two generations, which means it's time for a new desktop socket. That's how Intel rolls.

The only exception to that was Haswell, which launched with a new socket that only lasted 1 generation, since the Broadwell desktop CPUs were pretty much all cancelled.

Intel even went so far as to introduce some trivial incompatibilities between Kaby Lake and Coffee Lake. Not to say that the new socket delivered no benefits, but a few boards have been made supporting all 4 generations - from Skylake to Coffee Lake-R, showing just how minor the differences must really be. Why they went ahead with it is anyone's guess - no doubt power-delivery was one reason, but AMD seems to have addressed such matters without changing their socket. So, perhaps it was motivated by wanting to keep commitments to their board partners and force a few extra sales?

As for the function of the 500 additional pins, I'm going to speculate that it could have something to do with Thunderbolt / DP 2.0 / USB 4.
 
If the top end desktop chips also have the big little configuration, I'm going to be a little confused. It makes sense for lower power applications that need efficiency such as laptops, but if you're going to be selling to the diy builder space, most people are not going to want those little cores. Whats the point, especially if you need specific scheduling to make the most out of the big.little config?
 
If the top end desktop chips also have the big little configuration, I'm going to be a little confused. It makes sense for lower power applications that need efficiency such as laptops,
I had a similar reaction, when I first read about it. However, in a previous thread, someone pointed out that it lets them advertise the chip as 16 cores and should give them a not-insignificant multi-threaded performance boost over having just the 8 "big" cores.

If we consider that each "little" core is about 60% as fast a "big" core, yet uses about 40% of the area and maybe only 30% of the power, then it's both a more power- and area- efficient way to scale performance for highly-threaded workloads. Plus, they get better idle power numbers, by running backgound tasks on the "little" cores.

And all of the necessary software support should already be in place for Lakefield.

When you look at it like that, it really seems pretty obvious. Of course, I pulled the numbers out of the air, but I believe they're in the general ballpark, based on the slides they published (and other available info) on Lakefield.
 
That means you'll probably need a new spec for coolers to mount to along with more surface area for the heat spreader.
The clearance area on the motherboard is more than large enough to accommodate a larger socket as long as whatever HSF you may want to reuse does not have stuff hanging below it that would interfere. The IHS may be bigger but the only thing that matters is whether the main heat-generating dies are covered.

As for what the 500 pins might be for, my guess is Intel has brought the chipset on-package, so ~300 of those pins are HSIO lanes with their associated power/ground pins and the bulk of the remaining 200 pins are for chipset power.
 
I had a similar reaction, when I first read about it. However, in a previous thread, someone pointed out that it lets them advertise the chip as 16 cores and should give them a not-insignificant multi-threaded performance boost over having just the 8 "big" cores.

If we consider that each "little" core is about 60% as fast a "big" core, yet uses about 40% of the area and maybe only 30% of the power, then it's both a more power- and area- efficient way to scale performance for highly-threaded workloads. Plus, they get better idle power numbers, by running backgound tasks on the "little" cores.

And all of the necessary software support should already be in place for Lakefield.

When you look at it like that, it really seems pretty obvious. Of course, I pulled the numbers out of the air, but I believe they're in the general ballpark, based on the slides they published (and other available info) on Lakefield.
I hope you're right, cause if software support isn't impeccable, its gonna suffer. If all the cores also have hyperthreading (im don't remember if they do or not or if Intel has actually said anything about that) it'll make things a lot more bearable.
 
If we consider that each "little" core is about 60% as fast a "big" core, yet uses about 40% of the area and maybe only 30% of the power, then it's both a more power- and area- efficient way to scale performance for highly-threaded workloads.
Which is pretty much what GPUs do. Each individual core/shader may be only 1/10th as fast but there are nearly 1000X as many so you get ~50X as much performance per watt and ~20X the performance per area.
 
I hope you're right, cause if software support isn't impeccable, its gonna suffer. If all the cores also have hyperthreading (im don't remember if they do or not or if Intel has actually said anything about that) it'll make things a lot more bearable.
According to this:


...there's no indication of HT. I think only the first-gen Atom cores had Hyperthreading. Tremont is now about the 5th major revision of the uArch, not counting node-shrinks and lumping Goldmont/Goldmont+ together.

BTW, I noticed the article mentioned Gracemont - the one after Tremont - but Wikichip has basically nothing on it.

 
Last edited:
  • Like
Reactions: TCA_ChinChin
Which is pretty much what GPUs do. Each individual core/shader may be only 1/10th as fast but there are nearly 1000X as many so you get ~50X as much performance per watt and ~20X the performance per area.
It's true, but in this case (pretty much all big+little setups, AFAIK), the cores have the same architecture state. That means the OS can trivially & seamlessly move threads back-and-forth between the big & little cores, which is not something you can do between CPU & GPU cores.

For software to support "heterogeneous processing" of a common task between CPUs + GPUs, the code has to be separately compiled for each, and software has to explicitly manage sharing of the workload between them. It doesn't come for "free", like in the big+little scenario. I'm sure you know this, but I'm just explaining for the benefit of others.
 
  • Like
Reactions: gg83
I hope you're right, cause if software support isn't impeccable, its gonna suffer.
It is likely going to suffer for software written assuming all cores are the same. The OS can make some educated guesses on what threads require higher performance based on how they end up waiting for each other but that won't beat software updated to explicitly tell the OS what performance each thread or chunk thereof requires how much performance.
 
It is likely going to suffer for software written assuming all cores are the same.
The OS could actually be rather tricky. They could only expose the "little" cores via some new threading API that also reveals those sorts of details. That would keep legacy software from explicitly using the little cores, and the OS could either silently migrate light-duty threads to them, or otherwise use them for background tasks and other light-duty apps.

Regardless, for software which implements simple work-stealing strategies, it should be a non-issue. Keep in mind that heavily-threaded software should be written to work well, even when you have a couple cores' worth of other stuff running. It would be poor form if an app assumed it had exclusive use of every core.

And from the app's perspective, getting 60% of the time on a given core doesn't look a lot different than the core running 60% as fast. Also, consider that Hyper Threading can create a similar situation where, depending on whether a given physical core is shared and with what, its throughput becomes somewhat variable.
 
  • Like
Reactions: alextheblue
It is likely going to suffer for software written assuming all cores are the same. The OS can make some educated guesses on what threads require higher performance based on how they end up waiting for each other but that won't beat software updated to explicitly tell the OS what performance each thread or chunk thereof requires how much performance.
Doesn't Intel have a Windows 10 driver that does this already? It was my understanding that's how Intel's boost works so the cores are already not treated equally. Intel ranks the cores from best to worst during manufacturing which is read by the driver. When lightly threaded apps are running, the driver steers the workload to the fastest cores. Utilizing this driver, Intel would only need to rank the large cores 1-8 and steer the applicable work loads their way.
 
Keep in mind that heavily-threaded software should be written to work well, even when you have a couple cores' worth of other stuff running. It would be poor form if an app assumed it had exclusive use of every core.
The problem I was thinking about isn't "exclusivity" of cores but similarity of performance between cores. If an algorithm splits a problem into 16 equal chunks for multi-threading, your ultimate speedup ends up limited by the thread spending the most time on a slow core.

While SMT may introduce variability in per-thread execution rate, scheduling rotates cores unless core affinity was assigned and the overall impact is still fairly even, nothing like what big.LITTLE introduces.

As for how the OS can tell what should and shouldn't be scheduled on low-speed cores, I'd say the thread priority API already got some chunk of that covered: I doubt threads that go through the trouble of flagging themselves as low/lowest/idle priority will mind running on low-power cores.
 
  • Like
Reactions: alextheblue
If they make the "little" cores invisible to the OS and use them as a sort of hardware SMT, basically the same thing AMD SHOULD HAVE DONE with the second integer core on Bulldozer derived processors, it should result in a noticable performance increase. It also wouldn't surprise me if they went this route as Intel's implementation of SMT (Hyperthreading) has shown to be quite vulnerable to security holes.

The real question, though, is is DDR5 ready for mainstream?
 
If an algorithm splits a problem into 16 equal chunks for multi-threading, your ultimate speedup ends up limited by the thread spending the most time on a slow core.
But that's basically what I mean by "poor form". Normally, if you had 16 worker threads (which should be <= the number of hardware threads), you'd probably split the work into more like 64 or even 256 chunks. Often, the time needed to complete each chunk is variable, even in isolation of all other factors.

scheduling rotates cores unless core affinity was assigned and the overall impact is still fairly even,
Not to my knowledge. There's no reason it should, either. The more often you context-switch on a core, the worse your L2 hit rate goes. So, a compute-bound job usually stays on the same core for consecutive timeslices.

What the OS should be trying to even out is the amount of execution time each thread gets! And it doesn't need to move a job from one core to another, in order to make that happen. If a job has been using more than its share of time, simply don't run it for some number of timeslices. Maybe the next time it's run, it gets assigned to the same core, or maybe a different one - by then, the L2 cache contents has probably been replaced, so it doesn't much matter (as long as you keep it on the same NUMA node). But, it's no good just moving it from one core to the next, if there's no other reason to suspend it. That doesn't help anyone.

As for how the OS can tell what should and shouldn't be scheduled on low-speed cores, I'd say the thread priority API already got some chunk of that covered:
Most application code doesn't tweak priorities, in my experience. The downsides of potentially introducing some priority-inversion bug or related performance problems far outweigh the potential upsides. AFAIK, thread priorities are really the province of realtime embedded code, running on a proper RTOS.

Now, what happens at the process-level is a different story. Something like a virus scanner (i.e. doing a full scan) will frequently run at a lower priority. On Linux, people frequently use 'nice' to run similar backgronud jobs at a low priority, and I've even used it to keep long-running simulations from compromising system responsiveness.
 
  • Like
Reactions: alextheblue
Fast big cores to actually process, little cores to manage DDR5 RAM, PCIe4, GPU interface, storage drives, motherboard and power, peripherals, cooling and temps??????
 
Fast big cores to actually process, little cores to manage DDR5 RAM, PCIe4, GPU interface, storage drives, motherboard and power, peripherals, cooling and temps??????
PC memory, from what I can tell, has been doing self-refresh for several generations.

For the rest, you could certainly steer interrupts to the low-power cores, but that's not usually a significant amount of system load. I don't know that drivers, themselves, typically chew up a whole lot of CPU time. To the extent they do (such as GPU drivers), you might actually want them on the faster cores.

Drivers should appears as a subset of the kernel time you see in task manager or top. So, when you don't see much kernel time, then there's probably not much to be gained by shifting those threads.

Anyway, one thing I forgot to mention in my previous post is that I expect the OS would demote threads that frequently go to sleep or block on I/O. This would naturally include many of the services you mentioned. So, there wouldn't necessarily have to be special-case handling for each and every service or light-duty driver thread.
 
There's no way in hell the new Socket will be the same dimensions as LGA 1200.

It has to increase in size given the number of new pins.

That means you'll probably need a new spec for coolers to mount to along with more surface area for the heat spreader.

That means all old LGA 1200/115x coolers will be obsolete.

nope , you will just need a new add on bracket thats all , Most coolers are already compatible with bigger HEDT CPU with already bigger 2066 > 1700 pins socket
 
But that's basically what I mean by "poor form". Normally, if you had 16 worker threads (which should be <= the number of hardware threads), you'd probably split the work into more like 64 or even 256 chunks. Often, the time needed to complete each chunk is variable, even in isolation of all other factors.
For algorithms that can be split into an arbitrary number of chunks, sure. For more tightly coupled stuff like running game logic, not so much.