This is just a taste. Wait until someone actually designs an entire accelerator around this stuff!
The reason I say that is the PIM modules duplicate some functionality that's in the core compute die. So, if you removed that redundancy, it would free up some area in the core compute die for more compute that the PIM modules don't accelerate. The end result would be even greater speed up!