Intel Reportedly Puts Up 5GHz Core i9-9990XE CPU For Auction

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
I remember a time when AMD was getting quite desperate and they were auctioning off specially binned 220w 5ghz 9590FX that could OC past 5ghz just for the sake of headlines and to maybe impress investors ... Funny how things seem to have completely flipped.
 


I bet we won't see a CPU to really challenge Zen so soon. Intel will try to squeeze what it can from the current tech, but they are probably hard at work to make a new CPU from scratch since Zen 1. Maybe next year it comes public, something completely different, fast, cold and scalable, but it definitely won't be a Core-derivative.
 


You are not seeing the picture. IPC and all that, was linked heavily on lithography advancement, now it is linked to uarch. Intel dropped the ball there and never innovated. They only had the best fab and shear volume for binning. The reason why they are getting their ass kicked is because AMD introduced a revolution in uarch with chiplets.

Monolithic chips will never be able to compete on a performance and price point due to lower yields and numbers. If you have the same chiplet produced 10 times more than the monolithic die, you can have even better performances.

AMD 7nm chip is having the same performance than Intel best silicon... at half the power... on a sample... mid range CPU...

 

I wouldn't be so optimistic: compare all modern CPU architectures regardless of instruction set and you will notice that the more architectures evolve, the more they end up looking similar to Intel's architecture. Basically, that's the entire processor engineering industry regardless of instruction set agreeing that this is the most effective way of designing high-performance CPUs.

The fundamental principles behind several critical performance optimization tricks in modern CPUs can be traced all the way back to DEC's Alpha from 20+ years ago. Modern CPU architectures are looking more alike than never in large part thanks to DEC patents having expired over the last couple of years.

While nothing is stopping Intel from re-designing a CPU from scratch, it would end up looking awfully similar to current CPUs since there hasn't been a fundamental discovery that would revolutionize CPU design in the past 10+ years. The most recent such thing I can think of is Intel's Netburst trace/uOp cache which enables the CPU to bypass the instruction decoder latency and power penalties in loops tight enough to fit in there, which is a huge boon to x86 due to how convoluted some parts of the instruction set are.
 

You got that backwards.

Clock frequencies were closely tied to process scaling - smaller transistors switch faster using less power, enabling faster clocks with unchanged or slightly WORSE IPC for an otherwise identical design. IPC has and always will be primarily dependent on architecture: a 8086 will still require at least four clock cycles per instruction regardless of whether it is made on 3um or 14nm, though you may be able to run the 14nm version at 10+GHz instead of the original's 4.77MHz. If you could make an i3-8300 on 3um, you would still get approximately the same ~3.5 IPC you did at 14nm, albeit at only 1-2MHz due to how stupidly large the die would be.
 


I was looking for a reason to bring back Net Burst. Looks like I found it.
 


Depends on how rare. Over 10 grand?? Once you go into 'stupid' territory devoid of all rationality, it becomes nothing more than a trophy for the uber-rich to fight over.

 

No need to bring back Netburst for the trace/uOp cache, Intel salvaged the uOp cache to put it in Core2 and has been using it ever since. AMD also implemented something similar (uOp queue) in Zen to decouple instruction decoders from the rest of the instruction scheduling pipeline.
 


Lithography is linked to IPC that you want it or not. Yes lithography primarly revolve around frequency and power, however the process can provide better quality sillicon with each iteration. The process itself at the same frequency can have an impact on performances.

If you take a 14nm+ to a 14nm++ process, you might get a small increment of performances.

Basically IPC stand for Instruction per Cycle. One Cycle of a CPU compared to another one. Lithography is having an impact... and on less than 5%, it might be bigger than you expect. Intel 14nm process was really good and they perfect it for 4 years.
 


InvalidError is correct. IPC is defined explicitly and exclusively on architectural changes; often at throwing more transistors at the problem. Lithography give you more room to cram in more transistors.

If all you did was shrink the die without making any architectural changes, IPC would stay exactly the same. Performance would only scale linearly with frequency.

TLDR, shrinking the die allows for both IPC improvement via architectural changes AND increase in clock-rate. Those two changes is what provides performance increases.
 

Lithography has ZERO impact on IPC. If you take a 3um chip and shrink it to 14nm, it will have EXACTLY the same IPC because the chip is still getting EXACTLY the same amount of work done on each clock cycle that it did on its original process since it is EXACTLY the same architecture, just smaller.

Architecture is what dictates how much of what work the CPU is able to perform on each clock cycle - how many instruction decodes, how many instructions in the re-order queue, how many issue ports, what instruction mixes are possible to issue on each cycle, etc. Exact same architecture, exact same IPC.
 


Monolithic chips will always be faster due to the huge latency penalties created by using multi chip designs. That's why Threadripper has a game mode that disables half the cores, while Intel CPU's have no need for such hacks. There's no way to engineer around that. It would be like thinking if you train hard enough, you'll be able to run 50 meters quicker than you can run 25 meters.
 
Not at all. They've been hard at work (with Cannon Lake) much longer than that, which makes me believe their next architectures will be outdated (read: monolithic) at launch.

Since Intel's x86 and later x64 has been the norm similarities are to be expected.
But then you can't claim that at the bigger picture Zen2 is anything like the current Skylake(+++).

That statement has been thoroughly debunked!
Skylake Ringbus has only a 14% lower latency than Zen Infinity Fabric.
Skylake CPUs use the Ringbus to access L3 cache while Zen CPUs have direct access (with much lower latency) to L3 cache.

[video="https://www.youtube.com/watch?v=3K02zTu_baY"]Wath this![/video]
 

If you look at CPU core architectures, the newest ARM, MIPS, RISC-V, SPARC, POWER, x86, etc. all have the same fundamental architecture layout apart from most RISC architectures not needing the instruction decoder step.

If you compare AMD's execution pipeline diagrams to Intel's, Zen is more alike than ever.
 


It really comes down to UMA vs NUMA. Anytime one CPU has to communicate with another CPU to access its bank of RAM, you're going to always incur a performance penalty hit with regards to latency. That said, modern OSs are NUMA aware, so task scheduling on the optimal 'node' can be achieved.
 

Chip-to-chip latency depends on the type of interface used and distance. If AMD used a raw parallel interface instead of a PCIe-like derivative, then the latency can be down to little more than wire delay (200mm/ns) plus one cycle of whatever frequency that interface runs at to clock data out/in, adding less than 1ns.

The reason why ThreadRipper has a 'game mode' is not due to multi-chip 'penalty', it is due to games not being written with NUMA architectures in mind and Microsoft not providing some sort of application profiler service to optimize software for NUMA architectures by steering threads and memory allocation to minimize traffic between sockets/dies. With ThreadRipper 3 using an IO chip to tie all the chiplets together and provide a single common bank of memory controller, TR3 won't have that NUMA limitation.
 
I could reply to over half the posts in this thread....things got heated lol. But instead of slamming you with a text wall I'll keep it short. InvalidError is generally on the money about uArch, transistor scaling, IPC, IPC scaling in relation to transistor scaling (ie equivalent or less IPC at a smaller node)...and the list goes on. Olle P makes some excellent points but also falls short on the relation of CPU uArchs. Intel, AMD,x86, arm, etc are all getting very similar IMHO.

I would also like to point out while Olle P was correct about his cannon lake comment. There is a rumor floating around that Intel wants to relaunch a slimmed down version of x86, dropping compatibility with older instructions sets to help turbo charge the IPC from the "old silicon" being dropped for something "new" in its place but this is all still pretty salty.
 

Dropping legacy support would simplify the instruction decoders and eliminate circuitry dedicated to maintaining backward-compatibility with bugs-come-features of the past. I can't imagine this having much of an impact beyond reducing the amount of power consumed by instruction decoders thanks to the simplifications an ISA clean-up should allow.
 


Your guess would be far better then mine. I assumed from the rumors it was most about heat/efficiency and die space but beyond that I know your technical understanding exceeds mine. I get the math and physics of die shrinks and even many of the post 95' instruction sets/decoders but much of the legacy stuff prior to that...my knowledge becomes more anecdotal with a couple of bright spots so to speak. But i can see where dropping some of the legacy compatibility would stream line a lot of things but I can't imagine that would make everyone happy either. Some folks are locked into legacy software for one reason or another. I'll be curious to see if anything comes of it or this one stays in the salty deep.
 

Seeing how Windows no longer support 16bits anymore and modern CPUs aren't officially supported by Windows versions older than 7 or 10, dropping 16bits legacy modes and related kludges up to the 386 probably wouldn't be missed by many. What sort of savings this would mean for modern 32/64bits chips would depend on how much influence those legacy kludges had on the 32/64bits part of the architecture and how much of that could also be ditched.

My bet is most of the legacy overhead got optimized away and doesn't account for much anymore. The biggest gain for AMD and Intel would be the simple removal of all engineering effort required to maintain legacy kludges, not silicon savings, performance or power.