As long as the GPU has some regular DRAM (i.e. HBM, GDDR, DDR, LPDDR, etc.), the HBF can be used only to hold an AI model. I think it's a reasonable assumption the set of AI models a given machine needs to inference probably change somewhat infrequently. So, this usage would be classed as "write rarely, read mostly". That said, if such GPUs are used in rentable cloud machines with frequent turnover, write endurance could be an issue.
While we're talking about endurance, I guess another thing that should be mentioned is the sensitivity of NAND flash to temperature. Stacking a bunch of it next to a big, hot GPU die should mean more frequent self-refreshes are necessary, which could take a toll on endurance. I assume they've accounted for that, while considering the viability of their solution.
BTW, I suspect they're probably using a lower bit-density per cell, maybe even pSLC. That could really help with endurance, as would the abundant capacity + wear-leveling.