The Week In CPUs And Storage: Intel Shipping Purley (Skylake-EP) Xeons, Just Not To You

Status
Not open for further replies.

bit_user

Polypheme
Ambassador
I can understand if general release of Skylake-EP got held back because some features aren't yet ready for prime time. I just hope they had an open & fair bidding process for their customers who were interested in the specialized SKUs.

I still think the delay sucks almost as much as the rumors that Kabylake-X will have only 16 PCIe lanes, even though it'll share the same socket-2066 as the 28/44-lane Skylake-X. So, if you're only going to get 16 lanes and 4 cores, why would you even put a Kaby in that socket?
 

Cerunnos

Honorable
Sep 25, 2012
30
0
10,540
This talk of "preferred customers" is quite absurd. There has been and always will be "customized" SKUs for certain customers. This is simply customized chips. With FPGAs built into the CPU, each chip can then be modified to suit a particular demand. It isn't as if Intel isn't selling them to customers, but the fact that what it sells to each customer is most likely tailored to favor a certain type of workload (most likely defined by the customer). Sometimes the customer may actually work with them in implementing a specific design. These designs will most likely never be made available to other parties due to it being a joint project that has a ton of legal restrictions.

Same example, Microsoft and Sony both set the specifications for AMD to carry out for console SoCs. AMD customized the chip for each party according to what they want. Using the same logic here, you can say that AMD shafted Microsoft with the reduced GPU horsepower even if it clearly was Microsoft's decision.
 

PaulAlcorn

Managing Editor: News and Emerging Technology
Editor
Feb 24, 2015
858
315
19,360


The point is not that they sell customized versions, that is well known, and has gone on for a few years (as noted in the article).

The concern is the length of time that they are offering exclusivity of the new architecture - 6 months or more from the outward appearance of it, and the limited number of customers that can attain them. Remember, Intel controls anywhere from 94-99.5% of the data center, depending upon which analyst firm you believe. That gives it a considerable amount of power, actually an unprecedented amount of power.

 

Cerunnos

Honorable
Sep 25, 2012
30
0
10,540


Skylake-EP is most likely available to anyone that is already an Intel partner (as in, they buy from them in somewhat medium to large quantities). This type of seeding is also not very unique to the Purley platform, but what is special is that there are more customized versions tailored to each order.

We've been working on the Purley platform for some time already and the quantities we purchase from Intel are very low when compared to any of the firms listed in the article.
 

samopa

Distinguished
Feb 12, 2015
202
55
18,660
Article says:
Theoretically, if Intel then denied that same access to Microsoft, which is a "competitor" to Google with Bing and Azure, you could view that as preferential treatment.

Question:
Is USA denied same access to its technology advancement (such as F-22 Raptor) to other countries (such as North Korea) also a preferential treatment ? (it allows the access to its "ally" countries, that's also makes it "unfair" competition)
 

PaulAlcorn

Managing Editor: News and Emerging Technology
Editor
Feb 24, 2015
858
315
19,360


Well, matters of national security, or things under the guise of such (I imagine at times), are a whole 'nother ballpark. However, lawsuits are filed over an operating system vendor with a near-monopoly pushing their own version of a web browser. It's a bit different when it falls into that category, though I am no law professor and cannot untangle that for you with any authority :)

 

bit_user

Polypheme
Ambassador
...where to start. Well, the reason this is even a cause for concern is that the US has laws designed to prevent unfair business practices, since this could disadvantage consumers.

Now, when you're talking about selling weapons to foreign countries, there's the opposite concern. We don't want to sell them to anyone who might use them against us or our allies.

It's like comparing apples and oranges. They're different situations, with different laws and incentives.
 

PaulAlcorn

Managing Editor: News and Emerging Technology
Editor
Feb 24, 2015
858
315
19,360


Actually quite a few in China now, as well, but a bunch here in the US. The Sunway TaihuLight is actually the fastest in the world now, 10,649,600 CPU cores!!! It was built with China-developed RISC chips, largely because of US export restrictions from CFUIS.

It's so odd, Intel can't sell some chips in China, but IBM and AMD can licence their tech and both of those are now being built in China. Still can't figure out how that is allowed, or how it makes sense.
 

bit_user

Polypheme
Ambassador
Specifically multi-$B?

They just accelerated an existing project. It's only 28 nm, and those cores are really more akin to GPU cores, as well. It's not general-purpose programmable, like a Xeon would be. Still, impressive.

That restriction was merely punitive. China did something we didn't like, such as taking out Github wit their Great Firehose or hacking Google, and we responded by banning sale of a few CPU SKUs to them. Whatever the reason, it wasn't publicly disclosed.

We have almost no leverage over China, any more. There's pretty much nothing short-term we could do that wouldn't hurt ourselves worse. That's what happens when you pit a country run by politicians who just think about getting re-elected in 2, 4, or 6 years vs. one that has a unified party and a unified, regular planning process. I'm not saying I'd rather live in China, but their system does have certain advantages.
 

PaulAlcorn

Managing Editor: News and Emerging Technology
Editor
Feb 24, 2015
858
315
19,360


Yeah, its RISC, 256 core chips, definitely not gonna run Windows :)

China has a very long term plan, their five-year investment plan is pretty comprehensive, outlines pretty much how they plan to take over key portions of the semiconductor landscape in this version. I think its the 16th five-year plan. Their long-term planning is exceptional, it's like they are playing chess while everyone else is playing checkers. They think so much further ahead, they are just buying up semi tech left and right. CFUIS has blocked some of it here in the states, and Germany just blocked another of their attempts recently, which they are pretty unhappy about.

That said, they also use their MOFCOM regulatory agency pretty diligently as their own retaliatory tool. Oh, the games the politicians play. I can only guess at what Dell and EMC had to do to get their merger passed through MOFCOM so quickly, I'm sure it was brutal and many souls were sold along the way to approval.

China uses companies under the Tshinghua University and their entire web of subsidiaries under it (Unigroup, Unisplendour, among about 50 others) to do the work, but make no mistake, its a state-run entity. I've actually covered their attempts at buying into the NAND market pretty extensively.
 

PaulAlcorn

Managing Editor: News and Emerging Technology
Editor
Feb 24, 2015
858
315
19,360


That article you referenced was from 2012.

I know that K cost $1.2 billion USD in 2011. http://www.geek.com/chips/new-japanese-supercomputer-is-the-worlds-most-powerful-1392997/

I actually googled around a bit to find the price of a supercomputer for a piece I wrote a while back and recall there were a few well in excess of a billion. Of course, running one is also a huge long-term expense. I can't seem to find a good recent list atm, though.
 

bit_user

Polypheme
Ambassador
Worse than that, it's not even SMP. You can't simply recompile existing software for it. The cores only work out of core-local scratchpad memory, similar to the IBM Cell processor used in the PS3. This means everything has to be custom-written for it, pretty much from scratch.

That said, I'm coming around to the position that this is actually the way to continue scaling. Instead of wasting power and on-chip communication bandwidth to maintain cache coherency, let software & compilers handle it. When you're actually sharing data, perhaps then the hardware can assist. But this only works if the compilers are mature enough to manage the scratchpad memory the way a CPU would manage its caches. Otherwise, it's just a PITA for application programmers.
 

PaulAlcorn

Managing Editor: News and Emerging Technology
Editor
Feb 24, 2015
858
315
19,360


Yeah, many have opined that Sunway may be great at Linpack, but thats about it. Its still impressive though, check out this table of the current top ten.
Name............................Country..........Teraflops..Power (kW)
Sunway TaihuLight........China............93,015.....15,371
Tianhe-2...................... China............ 33,863.....17,808
Titan............................ United States....17,590....8,209
Sequoia .................... United States.... 17,173... 7,890
Cori ............................United States... 14,015... 3,939
Oakforest-PACS..............Japan.... 13,555... 2,719
K Computer.................. Japan.... 10,510....12,660
Piz Daint.........................Switzerland... 9,779.....1,312
Mira................................ United States... 8,587..... 3,945
Trinity ............................United States....8,101......4,233

Sunway's Teraflops are just insane. Compared to the US with Titan, its nearly 6x faster. And power, man that is elite, efficient as hell. That might lend some credence to the theories that its a paper tiger though. You know what they say about too good to be true.


 

bit_user

Polypheme
Ambassador
In this presentation, they claim:
Data movement now requires over 40x more energy than an actual calculation.

60% is used by the unnecessary on chip cache hierarchy

Source: http://www.csm.ornl.gov/SOS20/documents/Sohmers.pptx

Interestingly, the 16 GB of HMC2 embedded in Xeon Phi v2 can function either as a cache or software-managed scratchpad memory. I seem to recall there might also be a 3rd mode...

Another thing they do to tackle poor cache coherency scaling is to let you break up the chip into 4 groups of 18 cores, with no cache coherency between groups. I think Sunway also has 4 independent clusters on the chip, but even within those clusters it's not cache-coherent.
 

bit_user

Polypheme
Ambassador
So, that's basically what I'm saying. The way they pulled it off was to follow the same model as the Cell (remember how fast that was, for its time?) and do away with cache coherency.

If their tools and compilers are good enough, and the hardware can provide some assistance with synchronization, then it's a win. Otherwise, you get good performance for hand-written applications but it can't do much else.

In the short term, they got some eye-popping numbers to post up. In the long-term, I think they're on the right track. But, as a programmer, I'd rather be on a Xeon or conventional GPU (if I needed the extra performance). Current GPUs are also faster.
 

bit_user

Polypheme
Ambassador
Intel also noted that Xeon Phi provides direct access to (up to) 400GB of memory, which is a tangible advantage over the current GPU limitation of 16GB. The current Knight's Landing products feature 16GB of on-package MCDRAM (Multi-Channel DRAM) Micron HBM
KNL has 16 GB of in-package MC2 DRAM, but also a 6-channel interface to DDR4. So, I would posit that Knights Mill isn't changing any of this, except perhaps off-package 3D XPoint.
 

InvalidError

Titan
Moderator

Intel offers a new product with what I presume is a substantial die size. Said product will remain under very limited availability for the foreseeable future, possibly perpetually due to low yield per wafer - that's part of the massive price tag. All production for that foreseeable future has been booked with early orders from a handful of major players. That's just normal business: Intel has committed to fulfilling those orders and those orders mean few to no spare chips for smaller players until several months later.

First come, first served.

Why is there only a handful of devices with Qualcomm's newest flagship chips within the first year of Qualcomm launching a new SoC? Because the first orders from major manufacturers are in the millions of units, are often booked many months to a year before the chips are announced, which means several months of back-orders for anyone placing orders post-launch. That's why we end up with many new mid-range devices launching with last year's chips.
 

bit_user

Polypheme
Ambassador
If you're talking about like 28-core chips, fine. But my personal beef (and it's not really what Paul is talking about) is with the delay of the workstation/extreme segment, where the dies won't be much bigger than existing Skylake i7's. Look at any recent die shot, and you'll see that the iGPU takes up at least half of it. So, if you drop the GPU, then a 10-core die (I assume this is their baseline, and the 6- and 8-core variants just have 2 or 4 disabled) should be only a little bigger.

These 10-core chips should even be smaller than the Iris Pro models that have 72-core GPUs (3x what the desktop i7's feature), which are already in production and selling for much less than the > $1k I assume they'll charge for the 10-core Skylake Extreme i7 / E5-1xxx Xeon's.
 

InvalidError

Titan
Moderator

You forgot one thing: mainstream i5/i7 have 8MB of L3 cache while Xeon E5v4 CPUs have 30-55MB. That eats up a considerable amount of die space. In the case of the Purley CPUs with integrated FPGAs, the ones that are being shipped "exclusively" to a few clients, we're talking even bigger (potentially much bigger) after adding space for that FPGA work area.
 

bit_user

Polypheme
Ambassador
No, I didn't. L2 is distributed among the cores. So, it scales with your core count. The E5-1xxx v4 range goes from 10-20 MB cache. True, that's up to more than double what the desktop i7's have, mostly by virtue of having double the cores, but also having 2.5 MB/core instead of 2 MB/core. So, if the iGPU is taking half of the die and you replace that with another 4 cores, I doubt 25% more L2 per core is going to push the total die size much above the desktop i5/i7.

But one thing I didn't mention is the size differential between these and GPUs. When you talk about massive dies, only Intel's biggest CPUs are comparable to GPUs. According to this, a desktop Skylake i7 has 1.75B transistors, while Nvidia's comparably priced* GP104 has 7.2B transistors. Only a 22-core Broadwell Xeon equals that size, and the GP104 isn't even Nvidia's biggest Pascal GPU. So, when you call these "massive", you really need to distinguish between the ultra-high core-count SKUs vs. the small-server / workstation / extreme desktop SKUs. I expect they're also being fabbed on the more mature 14 nm process, rather than the Kabylake "14+" process.

https://en.wikipedia.org/wiki/Transistor_count

Also, according to that, the 8-core Haswell Xeon has 49% more transistors than Skylake i7. So, it seems the additional cache, quad-DRAM channels, and extra PCIe bus lanes do add up. But still not to anything the GPU guys would call "massive".

* I say they're comparably priced, when you consider that a GTX 1070 is pretty similar to an i7 w/ a cheap mobo + RAM.
 

bit_user

Polypheme
Ambassador
I'm normally reluctant to join the folks calling Intel "lazy", but I think the way they're dragging their feet on the Skylake-X CPUs/platform is exactly that. They should've truncated the Broadwell-EP series, similar to what they did with the desktop SKUs. Instead, I think they got greedy and decided to milk Broadwell for a full product cycle, in spite of the fact that it was so late.

If there were meaningful competition in this space, it'd be a different story. IMO, it's no coincidence that Skylake-X will launch around the time AMD's workstation APU is slated to drop.
 
Status
Not open for further replies.