News Valve confirms the Steam Deck won't have annual releases — Steam Deck 2 on hold until a generational leap in compute performance takes place

Pierce2623

Prominent
Dec 3, 2023
486
368
560
I just want the ARM version of Proton. X86 handhelds still suck because of how bulky they are. I mean, don’t get me wrong, I’ve got a Steam Deck. It just sucks.
 

Notton

Commendable
Dec 29, 2023
873
769
1,260
Steamdeck (LCD) launched on Feb 22', so 2yrs have passed.
The APU used was Zen2 + RDNA2, so there's been substantial improvements there too.

But I can totally understand waiting for a cheaper variant of Zen5 + RDNA3.5, or the Z2 chip, to ship.
 
Nothing off the shelf can compete, but Van Gogh wasn't off the shelf either. I have a hard time believing something based on Zen 4c/5c with RDNA 3.5 wouldn't be a lot higher performance especially if they added on some cache for the GPU. Power consumption for quad channel might be too high, but that would be another potential lever. Whether there's a new Steam Deck or not next year ought to be interesting for handhelds with Lunar Lake and Zen 5 based devices.
 
  • Like
Reactions: usertests
ngl I wish everyone had this mentality.

there doesnt need to be a sequel until its a noticeable improvement.
While they might not 'need' to frequently release significantly improved versions of the hardware, unlike with locked-down console ecosystems, any company can release their own competing PC handheld. So after a few years, the hardware is inevitably going to start looking a bit dated compared to newer products from competitors. Unlike with the Nintendo Switch, they can't go anywhere close to 8 years before a successor to the Steam Deck is released. Though, technically, I suppose Valve could just discontinue the product line and never release a successor like they have had a tendency of doing with all their other hardware. : P

There are some rumors about a potential Index 2 in the works, though unless they make the cost far more palatable, it's unlikely to do much to revitalize the stagnating PC VR ecosystem.

I just want the ARM version of Proton. X86 handhelds still suck because of how bulky they are. I mean, don’t get me wrong, I’ve got a Steam Deck. It just sucks.
The problem with that idea is that the PC games you would be running on it are designed for x86, and would in many cases see a notable hit to performance with the code being translated to run on ARM architecture. On a higher-powered desktop or laptop you might have performance to spare to allow for that sort of emulation, but on a handheld you are already being limited by the hardware even for native code, so the hardware would need to be overbuilt to counteract the performance loss, and would see a reduction in efficiency as well, probably countering any potential gains that the alternate architecture might have to offer.
 
  • Like
Reactions: NinoPino
So the thing with these sorts of custom bespoke solutions is they require heavy investment in design and testing. This isn't like PC building where you can put arbitrary parts together and expect it to work, they need to design new boards, firmware, drivers then certify it all as not gonna break twelve months after purchase.

What they are saying is that the performance benefit of going to a newer design does not out weight the upfront design costs. They are waiting for a bigger generational leap in performance-per-watt before paying those costs again.
 

John Kiser

Distinguished
Apr 19, 2014
11
2
18,515
I just want the ARM version of Proton. X86 handhelds still suck because of how bulky they are. I mean, don’t get me wrong, I’ve got a Steam Deck. It just sucks.
All well and good until you realize that they'd probably still need to draw quite a bit of power to compete at nearly the same level.

The problem saying you want arm is that the power envelope of arm isn't that much smaller than these handhelds actually are. The snapdragon 8 gen 4 can draw over 20 watts of power on its own and you need to consider thst the current gen in these things is about a 30 watt powder draw on some of rhwaw with a peak of 54 watts. They don't draw as much power as you seem to think.

The snapdragon in question draws over 20w and gets exceedingly hot if you let it run which means building the same bulky solutions which exist for cooling and putting controls into the devoces...
 

usertests

Distinguished
Mar 8, 2013
936
844
19,760
Nothing off the shelf can compete, but Van Gogh wasn't off the shelf either. I have a hard time believing something based on Zen 4c/5c with RDNA 3.5 wouldn't be a lot higher performance especially if they added on some cache for the GPU. Power consumption for quad channel might be too high, but that would be another potential lever. Whether there's a new Steam Deck or not next year ought to be interesting for handhelds with Lunar Lake and Zen 5 based devices.
Exactly this. Nothing off-the-shelf for the Zen 5 generation seems likely for Steam Deck 2. Kraken Point has an interesting 4+4 CPU but the 8 CUs RDNA3.5 isn't enough over 8 CUs RDNA2. Maybe top Strix Point / Z2 Extreme with 16 CUs isn't enough either if Valve wants to triple/quadruple the performance at the same power to try to turn 720p30 (minimum) gaming into 1080p60. 8-core Strix Halo gets the 256-bit memory controller and great iGPU, but it could be too expensive and power-hungry.

I think Valve would want only Zen 5c/6c cores as the low-power follow up to quad-core Zen 2. I'm not sure about the iGPU because there are many levers they could pull. There may need to be an NPU for FSR4, if that ends up being one way FSR4 can be run.
 
Exactly this. Nothing off-the-shelf for the Zen 5 generation seems likely for Steam Deck 2. Kraken Point has an interesting 4+4 CPU but the 8 CUs RDNA3.5 isn't enough over 8 CUs RDNA2. Maybe top Strix Point / Z2 Extreme with 16 CUs isn't enough either if Valve wants to triple/quadruple the performance at the same power to try to turn 720p30 (minimum) gaming into 1080p60. 8-core Strix Halo gets the 256-bit memory controller and great iGPU, but it could be too expensive and power-hungry.

I think Valve would want only Zen 5c/6c cores as the low-power follow up to quad-core Zen 2. I'm not sure about the iGPU because there are many levers they could pull. There may need to be an NPU for FSR4, if that ends up being one way FSR4 can be run.
Yeah I think Valve would have to be looking at a fully custom SoC with what's on the market now to make a viable successor. If the cache doesn't make a big enough difference I figured that the Zen 5c cores would be preferable due to lowering SoC size and being more efficient in their clock range. All of AMD's off the shelf SoCs with good graphics configurations have way too much compute for a power limited handheld device.
 
Last edited:
  • Like
Reactions: usertests

williamcll

Prominent
Jul 28, 2023
94
39
560
Yeah I think Valve would have to be looking at a fully custom SoC with what's on the market now to make a viable successor. If the cache doesn't make a big enough difference I figured that the Zen 5c cores would be preferable due to lowering SoC size and being more efficient in their clock range. All of AMD's off the shelf SoCs with good graphics configurations have way too much compute for a power limited handheld device.
That seems like a massive compatibility issue.
 

NinoPino

Respectable
May 26, 2022
487
303
2,060
Hmm we had this debate already, ISA only really matters at super low power. Well more like sub 500mhz or so. When we approach 1ghz (one billion operations per second) every scalar processor hits the same I/O issues and requires the same solutions. Those solutions are complex and erase all ISA distinctions.
This statements are nonsense, your's seems to be random numbers.
Why sub 500MHz ? Why not sub 400 or sub 800 or sub 1150 ?
What means at "1Ghz every scalar processor is I/O constrained" ? I suppose you are referring to the fact that are memory constrained (because we are talking of CPU processing performance), but in this case, why 1GHz ? A nonsense, because it depends on the whole memory subsystem. We can have many different memory types, speeds, bus sizes, caches. The ISA definitely does matter, more or less depends by the use case, but matters.
 

DS426

Upstanding
May 15, 2024
262
193
360
ngl I wish everyone had this mentality.

there doesnt need to be a sequel until its a noticeable improvement.
Agreed. It's just going to be more cost to pass on to customers if they are continuously redesigning, reengineering, revalidating, getting certifications all over again, etc. for small(ish) incremental improvements. Additionally, yearly product releases tends to shorten support cycles as Valve wouldn't want to be supporting tons of models at any given time.

Heck, even a two year cadence is somewhat brief IMO -- I could understand 3 years with refreshes, e.g. different screens and other components that can be relatively easily changed without changing the entire base platform. I know the other handhelds are blitzing out models, but that really isn't sustainable over the long haul.
 
This statements are nonsense, your's seems to be random numbers.
Why sub 500MHz ? Why not sub 400 or sub 800 or sub 1150 ?
What means at "1Ghz every scalar processor is I/O constrained" ? I suppose you are referring to the fact that are memory constrained (because we are talking of CPU processing performance), but in this case, why 1GHz ? A nonsense, because it depends on the whole memory subsystem. We can have many different memory types, speeds, bus sizes, caches. The ISA definitely does matter, more or less depends by the use case, but matters.

Just because you do not understand something does not make it nonsense. You need to understand microarchitecture, what the 1ghz barrier really was, what everyone had to do to get beyond it.

Scalar processors crunch what is effectively a very long stream of binary instructions. Those instructions will include compares followed by condition jumps, what we call branching. These instructions and the data they reference have to be stored somewhere and fed into the CPU for processing, what we call cache. As the number of instructions per second goes up, the need for cache scales up exponentially and more importantly the memory read/write latency really becomes important. Otherwise clock rate become useless as your instruction stream stalls out. This became extremely noticeable as we moved from the 400~500mhz CPU's into the 600+ mhz speeds. The old era of the OC'd Celerons being used everywhere is a good example of this. Just adding more cache didn't help much, you needed branch prediction and instruction reordering.

Something most do not understand is that while memory bandwidth has increased significantly, memory access times have not. This is the period, in real time, from when a read/write request is made until it is finished and returned. It's been about 14ns since DDR memory came out. This means the absolute lowest time your instruction steam stalls out for is 14ns, but in reality it's longer because you first have to check L1/2/3 cache tables.

200mhz means your instruction time is about one instruction every 5ns, having a cache miss isn't that big an issue. 400mhz is one instruction every 2.5ns, 500mhz is every 2ns. 1Ghz is now 1ns per instruction. You can see that as clock rate goes up, having the results of the calculations before they are executed becomes even more important. We really need to correctly predict those jumps and preload everything ahead of time. If we wait until the code resolves the If/Then statement to get the results, it's already too late. This is something every scalar CPU runs into, the ISA does not matter, it's all the same and if anything x86 has a very slight advantage as you can fit more instructions in your L1 instruction cache. The solution is the same solution for every ISA, complex front end instruction analysis, decoding and prediction. All that extra stuff along with the accompanying cache memory stakes up a ton of silicon, in fact it takes up more silicon then the actual instruction execution units.

The requirements for that big complex front end circuitry completely erases any distinction between X86. ARM, SPARC, Power and MIPS. As long as we keep the expected performance low, then there is no need for all that extra stuff and simpler ISA's have an advantage. The moment we cross past 1ghz all that other stuff becomes mandatory, otherwise you just end up wasting power and space for no benefit. Seriously go look at uArch's for every CPU design for the last ten years, every phone CPU, every desktop CPU, they all have some form of front end instruction decoding, scheduling and prediction system.
 
Feb 7, 2024
9
1
15
Hmm we had this debate already, ISA only really matters at super low power. Well more like sub 500mhz or so. When we approach 1ghz (one billion operations per second) every scalar processor hits the same I/O issues and requires the same solutions. Those solutions are complex and erase all ISA distinctions.
Since when are modern CPUs scalar processors? Without instruction-level parallelism, our computers would be waaaaay slower. I don't think there's a single mainstream consumer CPU that's not a superscalar design. You stuck in 2001 or something? Heck, I think this was a thing even before then. LoL
 

usertests

Distinguished
Mar 8, 2013
936
844
19,760
Zen 5c are smaller, not more efficient. At same frequency probably zen 5 core is more efficient due to the larger cache.
They have different efficiency curves. Zen 4c was more efficient at least at a narrow range of lower clock speeds. Area savings is also helpful for keeping it cheap. For something never going to clock over 4 GHz, the "c" cores are probably better. But maybe Valve will go hybrid instead.

https://www.tomshardware.com/news/amd-phoenix-2-review-evaluates-zen-4-zen-4c-performance

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb374c6ae-6b20-43b0-a9e5-be3675ec41c3_4000x2250.jpeg


iMGKGGq4TjgSZBNLaRxqqM-970-80.jpg.webp
 

NinoPino

Respectable
May 26, 2022
487
303
2,060
Just because you do not understand something does not make it nonsense.
The nonsense primarily is referred to the absolute numbers you give.

... As the number of instructions per second goes up, the need for cache scales up exponentially and more importantly the memory read/write latency really becomes important. Otherwise clock rate become useless as your instruction stream stalls out. This became extremely noticeable as we moved from the 400~500mhz CPU's into the 600+ mhz speeds. The old era of the OC'd Celerons being used everywhere is a good example of this. Just adding more cache didn't help much, you needed branch prediction and instruction reordering.
Why 500/600Mhz ? It depends on the whole memory subsystem. Giving a single number without specifying all the single characteristics of the machine is a nonsense.

Something most do not understand is that while memory bandwidth has increased significantly, memory access times have not.
Access times have decreased also if not as much as bandwidth increased. But more large and performant caches solved the problem egregiously.

This is the period, in real time, from when a read/write request is made until it is finished and returned. It's been about 14ns since DDR memory came out. This means the absolute lowest time your instruction steam stalls out for is 14ns, but in reality it's longer because you first have to check L1/2/3 cache tables.
200mhz means your instruction time is about one instruction every 5ns, having a cache miss isn't that big an issue. 400mhz is one instruction every 2.5ns, 500mhz is every 2ns. 1Ghz is now 1ns per instruction. You can see that as clock rate goes up, having the results of the calculations before they are executed becomes even more important. We really need to correctly predict those jumps and preload everything ahead of time. If we wait until the code resolves the If/Then statement to get the results, it's already too late.
You are talking of corner cases, for the most part the data/instructions needed are in the cache. This is the working principle of the cache.

This is something every scalar CPU runs into,
Not only scalar ones but every CPU.

the ISA does not matter, it's all the same and if anything x86 has a very slight advantage as you can fit more instructions in your L1 instruction cache.
The generic advantage of code density depends on the specific ISA. For example, ARM Thumb and Risc-V have solved the problem for the use cases where it matters. But for the typical workflows of x86 and the actual caches the code density is not a concern.

The solution is the same solution for every ISA, complex front end instruction analysis, decoding and prediction. All that extra stuff along with the accompanying cache memory stakes up a ton of silicon, in fact it takes up more silicon then the actual instruction execution units.
The different ISA is not a problem for the cache but for the frontend that need to be more complex and consequently use more energy and secondarily waste transistors and makes the optimization of the design a difficult task.

The requirements for that big complex front end circuitry completely erases any distinction between X86. ARM, SPARC, Power and MIPS. As long as we keep the expected performance low, then there is no need for all that extra stuff and simpler ISA's have an advantage.
More performance means more speed and wider designs. In both cases a simpler frontend waste less energy and help in optimize the design to achieve better solutions.

The moment we cross past 1ghz all that other stuff becomes mandatory, otherwise you just end up wasting power and space for no benefit.
Why 1 GHz ? All the actual solutions are a refinement process started decades ago, giving a generic limit is a nonsense.