View: https://youtu.be/VTi2l_hjVRs
Dialog for the 1st NPC (Bloom) is pretty on-point, the 2nd one not so much. My question would be how much storywriting work is required to get to this level of conversation for Bloom, keeping in mind that this is just one NPC out of potentially many.
The text-to-speech can probably be done with on-device NPU, but the LLM response at this point would probably require cloud-based connection (read: Internet connection). This would likely entail upsell to a recurring payment model (read: subscription).
Personally I would rather just to have the responses conveyed in text form, and skip the T2S along with facial expression & lip sync effects. It's less work, and believability/immersion wouldn't be any worse than that shown in the demo. Even if the demo's response lag & stuttering were fixed, it would still feel like talking to an animated mannequin.