Chiplet/tile designs have three key benefits:
- by breaking down large chip design into multiple smaller parts, each part can be tested separately to reduce the amount of wafer space lost on fill factor (you can fill a wafer closer to the edge with smaller dies) and the amount of space lost on dies with fatal defects on them
- you can mix-and-match processes to use the best process available for things that generally cannot be done on one single chip without massive compromises, ex.: DRAM requires a low-leakage process that produces transistors too slow for high-speed logic, chiplets lets you bring the two processes together in one design and
- you can create a whole product range out of semi-standard legos, which could possibly include more semi-custom options like what you mentioned.
The downside is that chip-to-chip interconnects do add chip manufacturing complexity, latency and consume more die area than monolithic, especially when you add standardized protocol layers in-between. So as far as CPU designs are concerned, I doubt they will be breaking those down smaller than compute clusters sharing their local L3 cache like AMD has been doing with Zen 2/3.