Qualcomm Debuts 10nm FinFET Centriq 2400 Processor With 48 Cores

  • Thread starter Thread starter Guest
  • Start date Start date
Status
Not open for further replies.
When just money can't provide intel the competitive part, intel is in a pinch. The 3D Nand joint venture with micron is a huge fail, 3D-Xpoint/Optane failed the promises.

AMD coming big with Zen, AMD-SAMSUNG coming big with their own version of non-volatile ram.
 


Intel is scared of AMD and Samsung the same way I'm scared of shrews. Not at all. Besides that, what does your post have to do with the subject?
 
Unless it's Samsung, I think the quoted process node isn't truly superior to Intel's 14+ node.

The more important point will be to focus on benchmarks & real-world performance. In those areas, I think they'll have a tough fight on their hands.

I think it's safe to say that per-core performance won't match Broadwell. I also wonder whether they have multiple threads, since such a high core-count means there will be a lot of stalls while waiting for memory reads/writes. So, even if the cores were more comparable to Broadwell, there might be little real-world benefit from having 48 of these vs. the 24-core/48-thread Xeons.

On the other hand, perhaps these cores are more comparable to the Silvermont Atom cores found in the latest Xeon Phi (with up to 72 cores / 288 threads). And unless the Centriq has HBM2 or HMC2, I think it's no match for the KNL Xeon Phi.
 

That's what I was thinking too.

Having twice the core count isn't that impressive when each core is less than half as powerful on its own. Simpler cores do tend to be more power-efficient when the workload scales well with core count though, which is usually the case for datacenters.
 


Certainly, power costs over time are important in server farms. The obvious question is power cost at a level of compute performance. If it takes 3 or 4 of these SOC's to make similar levels of performance your cost savings are out the window.

I've been watching this all play out for 40 years. The superior replacement for the Intel architecture has come and gone too many times to easily count. The big problem is a huge installed base of critical software that won't run on anything else without huge investments of redesigning and reprogramming. I'll wait until someone solves that problem before I worry too much about it.
 
But we don't know how they compare.

Facebook explicitly rejected that approach, hence the Xeon D:

https://www.nextplatform.com/2016/03/14/xeon-d-shows-arm-can-beat-intel/

The TL;DR is that single-thread performance still matters. Their point is that ARM can beat Intel, if someone can deliver cores that are sufficiently fast.

For datacenters, power costs typically overshadow initial purchase price. If the performance disparity were that great, then this product would've been killed before it even saw the light of day.

While you weren't looking, ARM built up pretty much all the software support they need, in order to unseat Intel. All that's left is for silicon like this to provide truly competitive implementations. They own mobile, they're assaulting the datacenter, and products like Chromebooks are even encroaching into the laptop market. Desktops will be the last holdout, but maybe within a decade...

If we're talking about the cloud, then OpenPOWER is another one to watch.
 


I'll believe it when I see it. This is not the first time by a long shot that the POWER architecture has attempted a head to head battle with x86. So far, it's Intel 3 Power 0. I'll wait.

Edit: Or make that 3-1. IBM does use their own processors in their own servers.
 


in the home market? possibly not, its largely 'dieing'

but laptops? if zen is significantly better then prior gens, that's more then good enough for anyone, and if they put even 1gb of hbm on it, that's it, no laptop wont have zen cpus unless they also have discrete gpus, this is also heavily about power efficiency, and the only metric we have here is amd 8 core 95 watt, intel 8 core 140 watt

amd cut out some things intel has, like a larger fpu unit, but lets assume someone doesn't need that fpu unit to be that large... you now have a cheaper more power efficient cpu so amd could also eat away a sizeable chunk of enterprize from intel.

till we learn more, there is a fun rumor. "AMD's Zen expected to have "disruptive memory bandwidth""
Not sure if anyone knows what this means, or if its real, but i do know there are workloads that demand bandwidth over power, and if amd delivers more then intel, thats a problem for intel.

If intel wasn't readjusting product lines because of zen, or even amd samsung memory, i would call them stupid.

lets not forget the last time the guy who designed zen designed a chip for amd, intel went full monopoly and forced people to not use amd which was in every way better than intel. personally, i don't think zen is going to be that for amd this time around, but i do think its going to undercut intel in price, and focused on doing what most people need from a cpu, and give up things like high fpu loads to intel as that's a vast minority need.

will be fun to see.
 
I'm not in the prediction or speculation game. That's why I repeatedly type that I'll wait. Having said that, I'll play for a minute.

Zen doesn't need to be spectacular. All it needs is to be a sufficing solution at a sufficing solution price point. With those two limited caveats satisfied, Zen will be a win in a fairly large audience among certain users.

What it won't do is displace Intel in the server market without beating the power/performance curve. We've already alluded to cost of ownership over the life of a SKU being the driving metric in the server space. That's the beast everyone, including Intel, is chasing. Any win in that space will make real money.
 
The difference is that you're talking about POWER, yet what's generated a lot of industry-wide interest is OpenPOWER. It's not exactly copying the ARM playbook, but definitely moving in a similar direction.

https://en.wikipedia.org/wiki/OpenPOWER_Foundation

I don't know if it'll truly rival Intel, but it's certainly got a shot. I thought it was interesting to see how Nvidia partnered up with IBM to integrate NVLink into some of their CPUs.
 
I predict Intel will beat them to it. And if you're going to put any in there, then it's better to put like 8 GB and get rid of the external memory interface. Saves space, cost, & power.

Like graphics. That's the main argument for it, really. The other thing it'd do is reduce latency, and that could have a broader impact (though probably on the order of a few %).
 
They will have Windows 10 support via ARM->x86 emulation pretty soon. If I had to guess that is there stepping stone to try and break into the server market. I have my doubts that this plan will work, serous doubts. They have to be ready to play a really long game like a decade long game if they really want to pull this off. I wish them luck as there is almost no competition in this space well not until you get to some big iron.
 
xpoint isnt even out yet.. they only begun sampling about 2 months ago. you can't call it a failure until people can actually buy it.

the real question with this new chip is cache coherency and if they can pull it off and stay competitive.
 
You should do yourself a favor and read Paul's excellent recap of 3D XPoint:

http://www.tomshardware.com/reviews/3d-xpoint-guide,4747.html

I think it's entirely fair to say it's so far been a disappointment. Let's hope it eventually lives up to the initial hype.

That would definitely be a problem, if all cores share the same memory space. We don't actually know if they're all mutually cache coherent, or if the chip is partitioned in some way. Xeon Phi offers a mode in which the chip can operate as four 18-core CPUs that just happen to share the same die.

Again, going back to the case study of Xeon D, I think there's a sweet spot between core density and single-core speed. I think it's worth reading into the fact that it only goes up to 16 cores.
 

Cache coherency is not that important: if you want your software to scale well across a large number of cores in a multi-socket system, you have to eliminate nearly all cache snooping, which means writing code that generates very few overlapping read/writes. If you don't do that, cache snooping may end up bottlenecking the system and your code will scale like crap. If your threads end up waiting for cache snoops 5% of the time, your code won't show any meaningful scaling beyond 20 cores.

Cache coherency is more like crutchs: they are there to shoulder the cost of your mistakes. For performance reasons though, you want to rely on them as little as possible.
 
That's a pretty strong position. I'm not aware of a mainstream OS that will use multiple cores that are non-CC. So, you'd have to run a separate OS on each, incurring corresponding overhead, and then do all the communication and synchronization at a higher level. Not only is this more complex and less efficient, but it would also incur higher latency, since the OS on one core would be unaware of threads on another.

So, if you want a simple programming model, broad OS support, easy load-balancing, and an easy, low-overhead communication model, cache coherency is the way to go. It scales just fine, as long as the core count is roughly in the single digits. Now, you can call it a crutch all you want, but if they're trying to capture market share, I think they're not going to force a drastically new programming model down the throats of software vendors and end users. For Qualcomm to even have a remote chance at this market, they need to support the broadest array of OS and application software possible, with the fewest headaches.

That said, I expect we might see a hybrid approach, in the near future. Where there's some hardware assist, but where caching tends to be software-managed.

In the meantime, Intel's approach of "custer on a chip" will probably gain some steam. Or, to the extent a chip looks more GPU-like, OpenCL defines memory hierarchies and sharing, allowing you to explicitly structure tasks around this.
 

That was my point: we're not in the single digits anymore as 48 cores is half-way to triple digits. If ARM wants a piece of the large server market, they'll also have to go multi-socket and cache snoops across sockets are very expensive due to limited bandwidth. You cannot write code that scales well while relying heavily on CC, you have to get rid of as many cache snoops as possible and that means writing your code in a style that eliminates most need for CC.

You don't need a whole OS for each core in a non-CC environment either as most of the OS is read-only space anyway and for performance reasons, each hardware thread needs its own queues and buffers anyway.
 
Here's what I think we agree on: cache coherency becomes an issue affecting latency, performance, and power-efficiency, by the time you get into double-digit core counts, if not before. Furthermore, it's theoretically possible (and probably a good idea) for the burden of inter-processor communication to be shifted onto software.

In practice, this isn't how the vast majority of software has been written, including mainstream OS and server software. That's the wrinkle. So, while we can agree about what should happen, long term, the reality is that Qualcomm wants to minimize the hurdles customers must navigate to adopt their product. That's why I doubt they ditched cache coherency, unless they went so far as to partition groups of cores into completely separate address spaces.

Intel's Xeon Phi supports full cache coherency for up to 72 cores, but out of recognition of the inefficiencies, they also offer the partitioned model.

Xeon Phi doesn't support multi-socket. I think the industry is actually moving away from it, since you can now pack so many cores on a single chip. I think few applications actually need large core counts, unless we're talking about GPU-like cores. The rest have already been adapted to work as distributed applications, so there's not as much benefit from having lots of cores share the same memory space.
 

Mainstream systems aren't going to have 48+ cores any time soon either. Depending on how aggressively AMD decides to price Zen, we may get a $300-ish 8C16T sort-of-mainstream CPU next year.


Big Data and memory-resident databases still want their 8+ socket systems to stack TBs worth of memory, though this may change if X-point ends up enabling memory configurations in the TBs per socket at RAM-like speeds.
 
But they have to run most existing software, or they won't catch on.

Point to a single example of a non-CC architecture that has caught on. The only one I can even think of is Cell, and that's long gone.

I'm pretty sure LGA2011 (either v1 or v3) supports only up to quad-CPU configurations. If I'm wrong, feel free to post links.

BTW, I think Big Data typically utilizes distributed databases. It's more economical to scale up with single-CPU systems. The main use-cases I can see for multi-socket (going forward) are high-availability and legacy applications.

I guess dual-socket could be useful for I/O intensive scenarios, like driving 4+ GPUs or lots of NVMe SSDs and 100 GBe adapters.
 
Status
Not open for further replies.