The VU is made up of several 'vector cores', which are comparable to GPU cores from AMD, Intel, and Nvidia
Using a general-purpose ISA in a GPU only makes sense for specialized applications. If your goal is to build a highly-scalable accelerator which can effectively compete with purpose-built GPUs, then you will befall the same fate as Xeon Phi. General-purpose ISA drags in too much overhead that purebred GPUs don't have to deal with.
Moreover, if they're trying to efficiently tackle AI workloads, then they'll need matrix-multiply hardware. Vector-level acceleration is no longer enough.
only Tenstorrent is developing high-performance RISC-V IP that can be used to build processors and AI accelerators.
Ah, but they didn't get rid of their TenSix cores. Those are the main workhorse of Tenstorrent's accelerators. From the linked article:
"In addition to a variety of RISC-V general-purpose cores, Tenstorrent has its proprietary Tensix cores tailored for neural network inference and training. Each Tensix core comprises of five RISC cores, an array math unit for tensor operations, a SIMD unit for vector operations, 1MB or 2MB of SRAM, and fixed function hardware for accelerating network packet operations and compression/decompression."
In addition to the matrix/tensor unit, the local SRAM is also key. That's something which doesn't fit in well with general-purpose CPUs. They can have cache, but cache has additional latencies and worse energy efficiency.