News Valve confirms the Steam Deck won't have annual releases — Steam Deck 2 on hold until a generational leap in compute performance takes place

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Since when are modern CPUs scalar processors? Without instruction-level parallelism, our computers would be waaaaay slower. I don't think there's a single mainstream consumer CPU that's not a superscalar design. You stuck in 2001 or something? Heck, I think this was a thing even before then. LoL

All general purpose CPU's are scalar, that's what ALU's are. Maybe you are confusing them with vector processors, which is what GPU's do. Scalar is linearly processing instructions, A then B then C then D, though modern CPU's come with vector extensions in the form of MMX / SSE / AVX which allow a limited ability to execute one instruction on multiple elements at once. GPU's on the other hand execute one instruction on entire arrays of data elements, which is what rasterization and image processing does. This the early very 3D rendering systems were frequently called "vector graphics".

Example, lets say I want to add the value of X to every pixel of a 320x240 display. That is essentially a for - do loop over all 76,800 elements doing ADD X, <Element> one at a time with scalar. With vector you can do that ADD against many elements all at once, allowing you to render that frame at 10x the rate.
 
Last edited:
Why 1 GHz ? All the actual solutions are a refinement process started decades ago, giving a generic limit is a nonsense.

I'm thinking you weren't around for the late 90's and early 2000's when the huge leaps in cache algorithms happened.

It's because read ahead prefetching caching becomes less and less effective as instruction latency gets lower and lower and all but useless once you go under 1ns. 1ns is where the performance of such simple cache systems falls off a cliff.

And here is why, read-ahead prefetching is the idea that if you request data at memory value X, there is high probability that you will want something else in the next 8~16kb nearby and that I will read the entire block into cache. And this kind of brute force bulk reading works well at lower instruction rates. It's problem is that it's very inefficient, out of that 8~16KB worth of information read in, the program may only really need 1~2KB worth and the rest is just taking up precious cache space. As you crank up the instruction rate (clock speed) you get to a point where you are frequently flushing the cache due to cache miss's because the CPU is crunching so fast you can't keep up. The only solutions are to get ridiculously large (which was done in some places) or get smarter with what you read in. Most everyone did a bit of both, AMD threw massive piles of cache at everything while Intel focused on a smarter caching algorithm. Everyone ended up landing on this solution and this solution doesn't care what language your CPU speaks.

Having said all that, you don't really care about any of this and just want to challenge someone.
 
Last edited:

NinoPino

Respectable
May 26, 2022
487
303
2,060
All general purpose CPU's are scalar, that's what ALU's are. Maybe you are confusing them with vector processors, which is what GPU's do. Scalar is linearly processing instructions, A then B then C then D, though modern CPU's come with vector extensions in the form of MMX / SSE / AVX which allow a limited ability to execute one instruction on multiple elements at once.
On CPUs side, scalar and superscalar is referred to the instructions, not the data. If a CPU is capable of executing more than one instructions per clock cycle than it is defined superscalar. Look at the definition of superscalar CPU. Here you are talking of SIMD instructions, that is another thing.