This doesn’t particularly make much sense at first glance. Here we have a processor with a higher IPC than Haswell but it performs worse in both DDR3 and DDR4 modes. The amount by which it performs worse is actually relatively minor, usually -3% with the odd benchmark (GRID on R7 240) going as low as -5%. Why does this happen at all?
So we passed our results on to Intel, as well as a few respected colleagues in the industry, all of whom were quite surprised. During a benchmark, the CPU performs tasks and directs memory transfers through the PCIe bus and vice versa. Technically, the CPU tasks should complete quicker due to the IPC and the improved threading topology, so that only leaves the PCIe to DRAM via CPU transfers.
Our best guess, until we get to IDF to analyze what has been changed or a direct explanation from Intel, is that part of the FIFO buffer arrangement between the CPU and PCIe might have changed with a hint of additional latency. That being said, a minor increase in PCIe overhead (or a decrease in latency/bandwidth) should be masked by the workload, so there might be something more fundamental at play, such as bus requests being accidentally duplicated or resent due to signal breakdown. There might also be a tertiary answer of an internal bus not running at full speed. To be sure, we rested some benchmarks on a different i7-6700K and a different motherboard, but saw the same effect. We’ll see how this plays out on the full-speed tests.