Given that boost behavior seems to be a bit of a crapshoot that can seemingly depend on all kinds of things other than silicon quality (including load, and we've already established that cinebench as used in this article is not what AMD recommends to see max boost), I'm not sure if we can draw a direct correlation between max boost clocks and silicon quality/overclock-ability.
That's exactly right... CB isn't good for it. I think it's because even in a ST mode it's putting a heavy load on the one thread and the processor won't boost high for that but a tiny bit, at very first of the bench mark.
What I've found is a good way is to run a heavily multithreaded game with HWInfo running in background. A game doesn't really load all it's threads heavily, and the lightly loaded threads seem to be quite bursty in nature as the scene changes, so there's ample opportunity for some of them to boost up to 4.4Ghz. I've observed this in Doom, BF1 and Shadow of the Tomb Raider.
I adjust HWInfo as Derbaur suggests: turn off monitoring of everything but the essential things to minimize impact and adjust the strobe to 500mS and no faster. I monitor SVI2 Vcore, regular VCore and the core multiplier for each of the 8 cores and plot them on screen. Play the game a bit then alt-tab back to it and look at all the boosts to 4.4Ghz on three cores, way more in the 4.3-4.35 range across every core. My gold star core is holding 4.25-4.3 Ghz almost steadily; that would be the heaviest loaded thread. And yet there is plenty of idle time (3.6Ghz which is probably actually deep sleep state) in between the boosts of the boosting cores.
This seems to me text book example of the what I've read from AMD of how it should work. The Windows scheduler is even cooperationg in that it puts the heaviest thread on the gold star core, the one that can hold the heaviest load with least thermal impact. Which suggest to me being gold star core doesn't mean it can boost highest: it means it can carry the heaviest load without contributing to CPU thermals and dragging down achievable clocks.
That's not a benchmark though, hard to say it's actually more beneficial than some all-core overclock. But it does seem to illustrate how the boosting behavior should work.