You don't use SMT to hide latency since SMT increases it, makes it more unpredictable and therefore more difficult to hide.
In this 2001 paper, they talk about it hiding latency of the functional units:
"Since decoupling features an excellent memory latency hiding efficiency, the large amount of parallelism exploited by multithreading may be used to hide the latency of functional units and keep them fully utilized."
The increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, are therefore directly concerned by this...
ieeexplore.ieee.org
It's also described this way, in this Advanced Computer Architecture lecture, from Imperial College London:
SMT threads exploit memory-system parallelism
- Easy way to get lots of memory accesses in-flight
- “Latency hiding” – overlapping data access with compute
And this paper from ACM's International Conference on Parallel Architecture and Compilation Techniques:
"Network processors employ a multithreaded, chip-multiprocessing architecture to effectively hide memory latency and deliver high performance for packet processing applications."
dl.acm.org
I could go on, but you get the point. You're obviously entitled to your opinion, but I'm with the experts on this one.
You and
@palladin9479 talk about how the parallel nature of GPU workloads "doesn't care about latency", but SMT is the primary mechanism which enables that decoupling. Otherwise, the shader cores would be mostly idle and performance would be like garbage, compared with what we have today.
Yes, SMT isn't the only way to hide latency, but that's the main way GPUs do it.