stress-ng has many different CPU methods, and they vary pretty significantly in the amount of power they consume. This is why it bugs me, when I see reviewers like Les Pounder (author of this article) who run it without selecting one of the more intensive options. By default, it just cycles through a bunch of stressors, not all of which are even designed to stress the CPU cores.
I'd encourage you to try playing with the different stress-ng options, if you really want to stress your system to the max. FFT might not be so highly-optimized on ARM. Hardkernel has used --cpu-method=matrixprod
, but that's not as good for stressing Gracemont as fft
.
It depends on your aim. These systems aren't designed for HPC, so Les Pounder won't test for that.
And with ARM there is tons of ISA extensions which are optional within and then again different across various ISA generations: it's really quite complex, even a mess.
I believe on the RK3588 they even have an issue similar to the AVX512 topic on Alder-Lake, where the A55 cores lack some of the A76 extensions the RK3588 actually supports in hardware. So you have to build your software without them or control their CPU core placement via numactl.
For one of my work projects it was actually crucial to try to measure the energy consumption of individual workloads on different architectures and by selectively enabling CPU cores or moving them to more efficient systems.
That never got nowhere, because the instrumentation simply isn't there, neither in the software nor the hardware. And one of the biggest problems is that with today's CPUs every single instruction might have vastly different energy requirements and then different again depending on tons of context. There is no instrumentation at the hardware level nor in the software which allows you to properly measure how much energy an application consumes. Just getting system wide data isn't even trivial everywhere.
What's weird is that Alder Lake-N can seemingly use far more package power than whatever you have PL1/PL2 configured for. At stock settings, it does seem like the iGPU begins throttling the CPU cores, but then it goes on to basically use as much power as it wants. Maybe this is just a weakness of the Linux driver.
I wonder if your Odroid actually exposes all variables for power control: PL1 Watts, PL2 Watts, PL1 duration and PL2 duration are minimum. And there is additional ampere settings as well as clock limits in some BIOSes. And to make sure these aren't overstept, you actually may need to reduce Turbo level supports or disable it entirely, after all it is basically a license to overstep limits with parameters that Intel details in big huge documents not available to the general public.
Where the system exposes all of these, like my Erying G660 (Alder Lake i7-12700H), I can make sure the SoC never exceeds e.g. 45 Watts, as per HWinfo. That's still much more at the wall, but that's another discussion below.
Weirdly, I tried furmark on Linux and it's really not as good at loading the Xe iGPU as that glmark2 commandline I mentioned. From what I recall, it's not CPU-bottlenecked, either. Maybe it's bottlenecking on Alder Lake-N's 64-bit memory interface? You used to be able to query the memory throughput with intel_gpu_top
, but I think that broke a while back.
The single channel memory on Alder Lake N is definitely a bottleneck for the iGPU and one of the many reasons I never wanted to buy one (another that almost certainly all are 8-core dies intentionally culled).
Jasper Lake has a far weaker predecessor UHD605 and it's still nearly twice the GPU performance with dual channel RAM. Goldmont wasn't nearly as sensitive but also a smaller GPU.
And on the far bigger 96EU Xe iGPU of my Alder Lake i7-12700H, single channel memory halves the GPU performance as one would expect. Xe has rather big caches for the GPUs that Jasper Lake doesn't have but in any case all Atom class iGPUs cut off at 5 Watt power so it may not matter as much.
What is exposed by the perf interface depends a lot on the hardware, nothing that the software can fix. And Intel changes that as they see fit. If I recall correctly I get distinct power consumption values for E-cores and P-cores on Linux for Alder Lake, but RAM power is no longer separated out for mobile and desktop chips, which on my Xeon E5-2696 v4 is reported and also often quite higher for 128GB of RAM than for the CPU on Prime95 max-heat.
And for some reason, you can't measure instruction statistics and power consumption at the same time.
Zen reports power consumption at the individual core level, but nobody else seems to go that deep and normal perf on Linux doesn't seem to support that yet. HWinfo has plenty more performance data for Zen on Windows, including RAM bandwidth consumption, even Watts for each DIMM, too bad there is no Linux equivalent.
On the ARM SoCs, much less sensor data and not integrated into Linux tools.
The PSU is a 60 W Seasonic power brick that claims 89% efficiency. They no longer make them, sadly. As for the other stuff, the system idles at around 7.5 W, headless (9W with KDE desktop and 1080p monitor via HDMI). That board is an industrial motherboard (Jetway brand) with a bunch of serial ports, a SATA controller, 2x 2.5G Ethernet, but not much else. The SSD is SK hynix P31 Gold (500 GB) and the M.2 slot is PCIe 3.0 x2. The RAM is a 32 GB DDR5-5600 (SK hynix die), but running at only 4800.
For the Mini-ITX Atoms I used a chassis, which came with a 60 Watt 12 or 19V power brick and a Pico-PSU class ATX converter. Pretty sure both are nowhere near "Gold" effiency.
Nor are the Intel NUC probably all that great, I don't recall all the measurements I'm sure I took before I put them into production. But generally I got the fully passive Goldmont+ J5005 Atoms and the far more powerful i7 NUCs with 32GB (Atoms) or 64GB (i7) with 1/2TB of SSD below 10 Watts on Linux desktop idle at the wall. For the NUCs I played with TDP limits mostly until the fan speeds under stress remained below enough not to bother. I didn't measure peak power consumption, because that wasn't important for me, noise and idle power was.
The next system here was different not by design but by need, because it was manufactured for far higher peak performance, a mobile-on-desktop Mini-ITX, a G660 from Erying, offering an i7-12700H with an official 45Watts of TDP, which comes with 120/95 Watt of PL2/PL1 in the BIOS, nowhere near where Intel actually wants them to be per market segmentation aims.
I had planned to run the Erying on an official and efficient Pico-PSU, and since it was a 45 Watt TDP chip and I was ready to sacrifice peak compute power for low-noise and overall efficiency on a 24x7 system, I felt a 90 Watt variant of the Pico should be enough: Pico themselves didn't sell them stronger.
But the 90 Watt variant would always fold on power peaks, even with what I believe were matching limits. My wall plug measurement device also wasn't very good or quick.
After running it off a regular 500 Watt "Gold" ATX PSU for a year or so I recently gave that another go, because I needed to reclaim that ATX PSU space to instead house a big fat Helium HDD in a backward facing drive bay there. Also I had added 4 NVMe drives with a switch chip to the system, meaning an extra few Watts to manage, while an Aquantia AQC107 10Gbit NIC had been there all along.
I got a 120 Watt Pico-PSU knockoff with an external 120 Watt power brick and a new wallplug measurement device and started testing. Without power limits imposed, the system would fold even without the hard disk or the NVMe RAID0 inserted.
But after playing with the limits, observation and load testing I was able to reach an absolute worst case of 110 Watt at the wall, while the SoC reported the very same 45 Watt of power I had configured for PL1 and PL2 via HWinfo, no matter which combination of power virus I threw at it. And that was Prime95, Furmark, HDsentinel, extra USB sticks etc., something I'd never expect to run at once.
Yet that's still around 30 Watts at desktop idle on the wall, interestingly enough slightly less with a visible desktop than after locking the screen. Seems Microsoft is getting busy once it's unobserved, even if this is a heavily curated Windows 11 Enterprise IoT LTSC with practically every phone-home activity deactivated...
But it's almost 5 Watts at the wall shut down. Not hibernating or with Ethernet on stand-by as far as I can tell, but perhaps the management engine inside the Intel with its Minix OS is still running...
Anyhow, peak 100% overhead at the wall over configured (and observed) TDP lmits at the SoC, no longer astound me.