Gastec :
As a metrology engineer from a major car manufacturer I tell you that 40 ?m deviation/difference in height is unbelievable for such sensitive computer components.
The HBM stacks are dissipating 7W at most, more likely closer to 5W. They don't need perfect coupling with the heatsink the way the 300W GPU die next to them does and the HBM stacks sticking out by 10 microns would be far worse than being 40 microns under as it would prevent the heatsink from sitting level on the GPU.
While the dies themselves need to be practically perfect, the same doesn't apply to their macroscopic dimensions. Having somewhere around 40 microns of slack provides room for tolerances between chip (DRAM/CPU/GPU) fabs, between wafer manufacturers, between die stacking processes, micro-BGA ball thickness used in the stacks and under the GPU, tolerances on the embedded silicon interposer, etc.
If you work in the automotive field, then you should be familiar with "non-critical dimensions" where tolerances can exceed 10%, such as the thickness of washers for the bolts that fasten seats to the frame where the range is anything between the minimum thickness to handle crash forces to whatever thickness the spare threads on the bolts can afford which can be 2-3X that minimum thickness. HBM stack height is non-critical, doesn't really matter what it is as long as it doesn't stick out above the GPU.
Also, with each HBM die being less than 1/5th the thickness of the GPU for a quad-die stack and ~1/10th for an octo-die one, the top die is that much more susceptible to mechanical damage. I'd say that's one more reason to make sure it sits a safe distance below the GPU height.