Question what is the relationship between worst case execution time (WCET) and cpu utilization of a single core processor

Jan 9, 2022
6
0
10
I am working on a single core processor i.e. Raspberry Pi Zero. I want to know how the worst case execution time of tasks executing on Pi (assuming sequential execution of tasks in single core) is related to the CPU utilization and/or core frequency?
Any possible article or reference would help!

From 'Perf' command in Linux, I obtained following values for a piece of code (application):
  1. Core frequency at which the code was run
  2. time elapsed (msec) for whole code to execute
  3. L1 dcache loads, %age of L1 dcache loads, and L1 dcache loads in M/sec
  4. L1 dcache load miss, %age of L1 dcache load misses amongst all hits
  5. L1 dcache store miss, %age L1 dcache store miss, and L1 dcache store misses in M/sec
  6. L1 dcache stores, %age of L1 dcache stores, L1 dcache stores in M/sec
  7. L1 icache load misses, %age of L1 icache load misses
  8. no. of cycles required to execute the code, %age of cycles required at a particular core frequency (GHz)
  9. Total no. of instructions, %age of total no. of instructions, IPC
  10. CPU clock (ms), part of CPU utilised
  11. branches, % branches
  12. dTLB load misses, % dTLB load misses
  13. dTLB store misses, % dTLB store misses
  14. iTLB load misses, % iTLB load misses
  15. stalled cycles front end, stalled cycles backend
 
Last edited:

kanewolf

Titan
Moderator
I am working on a single core processor i.e. Raspberry Pi Zero. I want to know how the worst case execution time of tasks executing on Pi (assuming sequential execution of tasks in single core) is related to the CPU utilization and/or core frequency?
Any possible article or reference would help!
There are a lot of things to consider. What is the time required for a context switch. How much data has to be loaded into cache. Speed of storage. Even your choice of OS will matter. I don't know if a value can be easily determined.
 
Jan 9, 2022
6
0
10
There are a lot of things to consider. What is the time required for a context switch. How much data has to be loaded into cache. Speed of storage. Even your choice of OS will matter. I don't know if a value can be easily determined.
There are a lot of things to consider. What is the time required for a context switch. How much data has to be loaded into cache. Speed of storage. Even your choice of OS will matter. I don't know if a value can be easily determined.
Considering all the factors to be variables, how can we then sum it up in an equation form? I want an idea about the relationship both the factors share!
 
Jan 9, 2022
6
0
10
Start simple, research how many clock cycles to make a context switch.
I have no clue what that would be.
Your basic summation will be clock cycles.
Lets say I run a program on the processor at X MHz core frequency, and then measure the clock cycles required to make a context switch. Then, what would it say about the WCET and CPU utilization relationship?
 
Jan 9, 2022
6
0
10
The mix of instructions used in the app can vary as to the number of clock cycles required.
Can you test a sample run and time it?
A single thread app should run at 100% cpu utilization unless some sort of I/O is required.
yes I have data with me, running a short application that doesn't require any I/O, at different values of core frequencies and it's relative duration.
 
Jan 9, 2022
6
0
10
yes I have data with me, running a short application that doesn't require any I/O, at different values of core frequencies and it's relative duration.
for ex. at 100 MHz, 80.724% CPU utilization was there and total time of execution as recorded practically was 102 seconds
 

kanewolf

Titan
Moderator
Lets say I run a program on the processor at X MHz core frequency, and then measure the clock cycles required to make a context switch. Then, what would it say about the WCET and CPU utilization relationship?
I assume you read the Wiki article on WCET -- https://en.wikipedia.org/wiki/Worst-case_execution_time and the references at the bottom.
WCET is more a real time QOS type measurement. It doesn't sound like your
execution as recorded practically was 102 seconds
Is very "real time". If you were worried about servicing a GPIO pin in less than 1/20 second, the WCET would be more appropriate.
A 100+ second execution doesn't seem like appropriate test case.
 
Jan 9, 2022
6
0
10
The mix of instructions used in the app can vary as to the number of clock cycles required.
Can you test a sample run and time it?
A single thread app should run at 100% cpu utilization unless some sort of I/O is required.
I have edited the question and provided all the details that i have on this processor!
 
If I understand this question correctly, what you're trying to achieve is some real-time value and trying to map it to CPU % utilization or clock speed to try and come up with some formula to perfectly or nearly perfectly determine the execution time of some task.

I believe this is a pointless endeavor because both the software (the OS) and hardware (CPU and its various features) make trying to find a deterministic value to a near-perfect degree practically impossible. You have to contend with:
  • Software:
    • The OS keeping your task on the CPU
    • Nothing interrupting the task (which is kind of impossible to have a handle on)
    • If you're using software built with a normally interpreted or JIT compiled language, initial runs of it will typically be slower than later runs.
    • If you have an I/O access, then you're basically screwed for any sort of predictability.
  • Hardware:
    • Can the application reside entirely in cache? Will the CPU even keep it there?
    • If you have an out-of-order execution processor, the retire stage of the pipeline may vary depending on how many things were done out of order
    • If there's any sort of branch prediction, then branches in your code may have an unpredictable impact on execution time
Those are off the top of my head. In any case, there's a lot of factors that can add time that are beyond your control. At best all you can do is do a run of enough samples that you can create a model that you have high confidence in is how the thing behaves.
 
The main issue here is even for a single-threaded task, you still have an OS managing everything under the hood. So depending on your current/past workloads, your performance could change by quite a bit (percentage wise at least). The biggest factor for a simple workload is probably whether everything is already cached or not, but depending on what you are doing (IO, data, etc.) other components (HDD/RAM especially) start to become factors.

Honestly, this is something that's hard to really define; there's really too many permutations to consider. What we typically do is just run the program a number of times, varying how we do so (eg: reboots between runs? Run other apps first? Etc.) and look at the "worst case" results to get an idea what worst case execution looks like. It's still a ballpark number, but better then nothing.
 
If you really want to dig down and do some analysis on the code that you're running, you can try and find a assembly dump of what the program is executing, then using ARM's documentation, map the instructions to how many clock cycles it takes to execute that instruction and add it all up.

Of course, this only applies to just the program running on the CPU. Again, there are going to be a million other things affecting the actual run time of the program.