Then your one model would need to be trained with all of those alternate traits to prompt and variants of those traits for flavor in both the LLM chat and text-to-speech generator and still be substantially larger, definitely too large to fit in VRAM on top of graphics.
So, you're an expert on deep learning, now?
Like I said, I think good LLMs are much more similar than they are different. It'd be more efficient just to make one versatile enough to handle the various characters via prompts, than to have distinct ones for each character.
From what little I read on the topic, high quality generalized AI TTS like Microsoft's VALL-E use models ranging from 16GB to over 500GB in size with 1TB planned. Good luck running the higher-quality variants locally.
Well, guess what? The article didn't say they used Microsoft's VALL-E, it said they use their own Riva SDK. It does both Automatic Speech Recognition
and Text-to-Speech. Furthermore, it's a "SDK for building and deploying fully customizable, real-time AI pipelines that deliver world-class accuracy in all clouds,
on premises, at the edge, and on embedded devices." The release notes reference issues with fitting certain languages on a 8 GB embedded platform. Since those are unified memory devices, it's unclear whether they mean the model is larger than 8 GB or just that it won't fit whatever portion of it would be available to the GPU.
A quick web search is all you'd have had to do, if you wanted to actually have some relevant knowledge, instead of just BS'ing.
Similarly, Nvidia Omniverse Audio2Face lists a GPU with 8 GB as the system requirements. That's presumably for film & video production-quality results. Perhaps their model for games could be much smaller.
Imagine needing 16-24GB of VRAM for graphics and another 200+GB for AI between high-quality TTS, GPT-like natural text prompting, personality, etc. models.
You don't need the degree of encyclopedic knowledge that ChatGPT has, so I think your estimate is off by
at least an order of magnitude.
Keeping what each AI knows or learns partitioned between NPCs, game saves, players for multi-player games, etc. could be a challenge. You either end up with gigantic save files or having to retrain the starting AI on load to match the save state.
You don't have to persist the entire state of the transformer - just the sequence of prompts which brought it to that state. Much, much smaller.