This statements are nonsense, your's seems to be random numbers.
Why sub 500MHz ? Why not sub 400 or sub 800 or sub 1150 ?
What means at "1Ghz every scalar processor is I/O constrained" ? I suppose you are referring to the fact that are memory constrained (because we are talking of CPU processing performance), but in this case, why 1GHz ? A nonsense, because it depends on the whole memory subsystem. We can have many different memory types, speeds, bus sizes, caches. The ISA definitely does matter, more or less depends by the use case, but matters.
Just because you do not understand something does not make it nonsense. You need to understand microarchitecture, what the 1ghz barrier really was, what everyone had to do to get beyond it.
Scalar processors crunch what is effectively a very long stream of binary instructions. Those instructions will include compares followed by condition jumps, what we call branching. These instructions and the data they reference have to be stored somewhere and fed into the CPU for processing, what we call cache. As the number of instructions per second goes up, the need for cache scales up exponentially and more importantly the memory read/write latency
really becomes important. Otherwise clock rate become useless as your instruction stream stalls out. This became extremely noticeable as we moved from the 400~500mhz CPU's into the 600+ mhz speeds. The old era of the OC'd Celerons being used everywhere is a good example of this. Just adding more cache didn't help much, you needed branch prediction and instruction reordering.
Something most do not understand is that while memory bandwidth has increased significantly, memory access times have not. This is the period, in real time, from when a read/write request is made until it is finished and returned. It's been about 14ns since DDR memory came out. This means the absolute lowest time your instruction steam stalls out for is 14ns, but in reality it's longer because you first have to check L1/2/3 cache tables.
200mhz means your instruction time is about one instruction every 5ns, having a cache miss isn't that big an issue. 400mhz is one instruction every 2.5ns, 500mhz is every 2ns. 1Ghz is now 1ns per instruction. You can see that as clock rate goes up, having the results of the calculations before they are executed becomes even more important. We
really need to correctly predict those jumps and preload everything ahead of time. If we wait until the code resolves the If/Then statement to get the results, it's already too late. This is something
every scalar CPU runs into, the ISA does not matter, it's all the same and if anything x86 has a very slight advantage as you can fit more instructions in your L1 instruction cache. The solution is the same solution for every ISA, complex front end instruction analysis, decoding and prediction. All that extra stuff along with the accompanying cache memory stakes up a ton of silicon, in fact it takes up
more silicon then the actual instruction execution units.
The requirements for that big complex front end circuitry completely erases any distinction between X86. ARM, SPARC, Power and MIPS. As long as we keep the expected performance low, then there is no need for all that extra stuff and simpler ISA's have an advantage. The moment we cross past 1ghz all that other stuff becomes mandatory, otherwise you just end up wasting power and space for no benefit. Seriously go look at uArch's for every CPU design for the last ten years, every phone CPU, every desktop CPU, they all have some form of front end instruction decoding, scheduling and prediction system.