News Intel Puts 10nm Ice Lake CPUs on an M.2 Stick, Meet the Nervana NPP-I Accelerator

Admin · Jun 27, 2019

Intel's NPP-I comes as a modified 10nm Ice Lake processor that will ride on a PCB that slots into an M.2 port.

Intel Puts 10nm Ice Lake CPUs on an M.2 Stick, Meet the Nervana NPP-I Accelerator : Read more

engin33r · Jun 27, 2019

It's not called "NPP-I", it's called NNP-I (i.e. neural network processor - inference)

TerryLaze · Jun 27, 2019

Holy multicoring Batman!
Anybody knows how fast or slow m.2/pcie 2 is compared to infinity fabric?
Not that it matters for AI or DC in general but in case this trickles down to consumer products,now that would be some big little that we could get behind.

JayNor · Jun 27, 2019

There's a presentation on this Spring Hill NNP-I chip scheduled at Hotchips 31 in August, on day 2. Anandtech has the program listing.

GetSmart · Jun 27, 2019

TerryLaze said:
Holy multicoring Batman!
Anybody knows how fast or slow m.2/pcie 2 is compared to infinity fabric?
Not that it matters for AI or DC in general but in case this trickles down to consumer products,now that would be some big little that we could get behind.

Its designed as an add-on A.I accelerator for servers. , Intel used Ice Lake CPUs instead of the usual ARM based CPUs (such as those used in Intel's FPGAs). M.2 slots are common on many motherboards including the mainstream desktop ones. Have you seen any standardized slots for Infinity Fabric?

TerryLaze · Jun 27, 2019

GetSmart said:
M.2 slots are common on many motherboards including the mainstream desktop ones. Have you seen any standardized slots for Infinity Fabric?

How do standardized slots affect any speed or latency?
I just asked how much slower it would be compared to IF to cross communicate between CPUs through the pci/m.2.

GetSmart · Jun 27, 2019

TerryLaze said:
How do standardized slots affect any speed or latency?
I just asked how much slower it would be compared to IF to cross communicate between CPUs through the pci/m.2.

The problem is that currently Infinity Fabric is only used by AMD internally for their own chips and can be considered proprietary. The PCI Express used on the M.2 slot is standardized and most common. If you are talking about communications between Intel's Ice Lake CPUs with the NNP-I then that is the fastest (likely thru the ring cache interconnect) because its all internal in a single chip. If you are talking about communications between the server CPU (either Intel Xeon, AMD Opteron or AMD Epyc) with the add-on A.I accelerator module then of course it would be much slower with higher latency and less bandwidth.

JayNor · Jun 28, 2019

There is an NNP-I article from March 17 on serverhome that describes boards with up to 12 M.2 slots. There is also an article on March 16 that has more info on NNP-L1000 learning chip.

jimmysmitty · Jun 29, 2019

GetSmart said:
The problem is that currently Infinity Fabric is only used by AMD internally for their own chips and can be considered proprietary. The PCI Express used on the M.2 slot is standardized and most common. If you are talking about communications between Intel's Ice Lake CPUs with the NNP-I then that is the fastest (likely thru the ring cache interconnect) because its all internal in a single chip. If you are talking about communications between the server CPU (either Intel Xeon, AMD Opteron or AMD Epyc) with the add-on A.I accelerator module then of course it would be much slower with higher latency and less bandwidth.

Intel has their own interconnect they would use either way, OmniPath.

This also might be a way to test it out and see how it does first before moving onto their Forevos ideas. I could imagine putting a NNP stacked into a chip would be beneficial.

TerryLaze · Jun 29, 2019

jimmysmitty said:
I could imagine putting a NNP stacked into a chip would be beneficial.

Depends for what.
NNP is just the modified iGPU to be better at AI so most chips just need these changes to their existing iGPU.
The additional cores are, I believe, just a by product for them it's just cheaper to use existing full CPUs instead of figuring out how to make an iGPU an external independent thing.

GetSmart · Jun 29, 2019

jimmysmitty said:
Intel has their own interconnect they would use either way, OmniPath.

This also might be a way to test it out and see how it does first before moving onto their Forevos ideas. I could imagine putting a NNP stacked into a chip would be beneficial.

That OmniPath is usually for server rack-to-rack interconnect (much like Ethernet). However within the chip itself, usually either the ring interconnect or the mesh interconnect would be used.

TerryLaze said:
NNP is just the modified iGPU to be better at AI so most chips just need these changes to their existing iGPU.
The additional cores are, I believe, just a by product for them it's just cheaper to use existing full CPUs instead of figuring out how to make an iGPU an external independent thing.

That NNP-I is not a modified integrated GPU but rather a dedicated A.I. accelerator architecture. Intel's inclusion of Ice Lake cores removes the need to pay licensing fees for using ARM based cores.

bit_user · Jun 30, 2019

TerryLaze said:
in case this trickles down to consumer products,now that would be some big little that we could get behind.

If you have a RTX Nvidia card, then you already have something on par with it. We'll have to see how the specs ultimately match up, but their statement about it being "multiple orders" faster than a CPU or GPU is almost certainly excluding GPUs with Tensor cores.

bit_user · Jun 30, 2019

jimmysmitty said:
Intel has their own interconnect they would use either way, OmniPath.

No. OmniPath is for connecting multiple boxes (or blades) - not chips that are so tightly-integrated. And they already said that PCIe is used for communication between this board and the host.

The Intel equivalent of Infinity Fabric is not OmniPath - it's UPI. In future, perhaps that will be replaced by CXL.

bit_user · Jun 30, 2019

What I want to know is which of those dies is which? I assume the bigger one is the Nervana chip?

And how much RAM do these blades have? One reason GPUs excel at AI is the oodles of really fast RAM they've got. As models are often hundreds of MB, I'm skeptical that Nervana is going to have enough on-chip storage for bigger ones.

BTW, I had a good chuckle at that slide where they have a down-arrow labelled "NN Performance" that's meant to show increasing NN performance toward the bottom of the pyramid. Pretty poor graphic, there.

mdd1963 · Jun 30, 2019

So whatever it does will be constrained by 32 Gbps PCI-e 3.0 x4 lanes of bandwidth...

bit_user · Jun 30, 2019

mdd1963 said:
So whatever it does will be constrained by 32 Gbps PCI-e 3.0 x4 lanes of bandwidth...

Okay, so think about image classification - 4 GB/sec is a heck of a lot of images! And yes, I'm talking about compressed images - decompressing them would probably be the job of the Ice Lake CPU. That would scale a lot better than having the host CPU try to do all the decoding.

Now, that brings up one thing they might not have considered, which is video processing. If they removed the iGPU, then they probably also lost the hardware video decoder, which is critical for being able to analyze video - something Nvidia knows quite well. Nvidia substantially beefed up the decoder hardware in their Tesla T4 GPU relative to previous generations, presumably to match the increased inferencing horsepower provided by its tensor cores.

Search

News Intel Puts 10nm Ice Lake CPUs on an M.2 Stick, Meet the Nervana NPP-I Accelerator

Admin

Administrator

engin33r

TerryLaze

Titan

JayNor

Honorable

GetSmart

Commendable

TerryLaze

Titan

GetSmart

Commendable

JayNor

Honorable

jimmysmitty

Champion

TerryLaze

Titan

GetSmart

Commendable

bit_user

Titan

bit_user

Titan

bit_user

Titan

mdd1963

Titan

bit_user

Titan

TRENDING THREADS

Latest posts

Moderators online

Share this page