So in going from a 1,021 single-core score to 7,155 for 120 cores what is the limiting factor that only leads to a 7x improvement for 120 times more cores?
I think GeekBench MT is known not to scale well. I'm looking at Ryzen 9 9950X scores, right now, and the ST scores are around 3400, while the MT scores are around 21400. That's a ratio of 6.3x for 32/16 threads/cores. Obviously, the MT clocks are going to be lower, but you'd expect at least something in the double digits, for code that scales
reasonably well.
So, just going on pure conjecture, what I'd say is the Geekbench developers decided to make the MT benchmark stress various performance aspects that can be bottlenecks for MT performance. This would include things like inter-core and cross-NUMA communication. If that were true, then you'd be looking at a dual-CPU setup having to do a lot of communication between sockets, which could not only be a bottleneck but also just add a lot of latency to memory accesses.
To check this assumption, I thought I'd look up some other leading dual-CPU configurations. One I checked was the Xeon 8480+, which is a 56-core flagship model of the Sapphire Rapids generation (similar to Alder/Raptor Lake). The ST scores were around 1650, while MT were around 16500. So, that's only 10x scaling for 224/112 threads/cores.
Next, let's see what happens when we restrict it to a single CPU configuration. Since I didn't find entries for the Xeon 8480 with less than 112 cores that didn't have other weirdness, I decided to search for the Xeon W9-3495X, which is essentially the same CPU, but running at a higher clockspeed and limited to a single socket. In this case (also using Linux as the OS), I found one with ST scores of 2542 and MT scores of like 21364. I should note that a bunch of Dell 7960 workstations came in with lower scores, often dropping the MT score by more than half. This leads me to believe that many of these Dell systems are shipping with partially-populated memory channels, which is something I've seen them do, on machines of this class. That's an interesting datapoint, since it suggests that the MT score might indeed involve a lot of memory-based communication between threads.