Not really that odd. Memory is just one particular type of IO and just like any other IO, the front-end circuitry does not scale much with process. Also, one major problem with giving each CPU chiplet its own local memory is heavier performance penalties when cores need to access non-local memory, which is particularly troublesome for consumer software generally oblivious to memory layout. Centralizing all memory controllers in the IO die eliminates issues with non-uniform memory access, albeit at the expanse of 10-20ns worse memory latency for every core.
Since optimizing for one subset of workloads often requires beefing parts of the architecture that may be detrimental to other workloads, such as accommodating workloads with larger cache footprint with larger caches at the expense of increased L2/L3 latency which is detrimental to workloads with small cache footprint that benefit more from low latency, it is impossible to design a chip that is simultaneously superior in every measurable way.
CPU design has always been a game of compromises.