The difference in CISC and RISC is not academic. The two have different execution Registers and it takes time to fill up those registers. The CISC usually has 4 General purpose registers and an Accumulator register, that can be complex to understand in terms of writing code for it. Direct Assembly code is so different from High level language, which is why I have said what I said.
CISC, in a single call the the Instruction register can carry out a string of functions all in a row, without having to make a further call on that instruction. The Decoding or Encoding is simply passed through a chain of functions that are Preset and negates the need to keep instructing the execution and Command side of the core, repeatedly making multiple calls to run the same function as the RISC must. This can cause latency that is inherent in RISC due to each Execution or part of an instruction has to be decoded on an individual basic instruction. The RISC chooses to keep the encode and decode aspect simple, which means that simple tasks like simple math Algorithms are better run on the RISC because the Register core is better suited than the AX BX loads on 8086 style core structure. However if a call like a Fourier transform with multiple coefficients (done all the time on general purpose computing), would run better on CISC.
So a benchmark suite can favour one over the other simply by favouring a complex instruction set or just smacking out simple instructions in a predictable way. A Graphical suit like 3D mark will run a complex set of instructions, but in a linear way, where as a Game complicates that by having to switch between different complex tasks.
The last problem is simply between OS. Having a Benchmark running on an optimised operating system with apple helping to streamline its development, is always going to perform better than bloatware OS systems like Windows 10, and adds a further unfair elements in the comparison. In some cases, some LINUX Distros are 50% faster than Windows 10 in some games and apps.