I'm not disputing the possibility that it will improve, only questioning the size of the AI model required to produce a diverse number of fully fleshed out TTS voices using Microsoft's as a reference for how such models could be 100s of GBs.Anyway, tech generally improves. So, let's see how it matures. I think it would be silly to make sweeping pronouncements about it, so soon. AI is a rapidly-evolving field, not least text-to-speech and "deep fake" audio.
Just now, I accidentally clicked an AI dub on YT using the Honest Movie Trailer guy's voice and went: "WTF? He does news now?" for a second until I remembered how good some AI voices are these days. I doubt that level of output quality comes cheap.
Last edited: