This technology could work well at home or in a private office or lab environment, but isn't well-suited to cube-world. In light of that, it should probably be enhanced to more appropriately focus on those use-cases.
How well does this technology differentiate between a statement directed to it (e.g. "open 'c:\The Door.docx'") a statement made in the ambient environment (e.g. "Open the door.") I get the "hello dragon" and "go to sleep" part, but is there something like Siri where you address the device by name, or does it simply assume that you're talking to it?
Here's where I think MSFT should take this:
First, use a pair of microphones so that you can determine the direction of the speaker.
Second, when you address the device, either with an individual statement such as, "Siri, find me a Chinese restaurant nearby" or with a batch statement such as, "Siri, record the following dictation," the following should occur:
- the device should identify and authenticate you using voiceprint recognition on its identifier - in other words, how you say "Siri" (or whatever name you choose to give it). If you have setup casual authentication, it automatically changes context to your associated windows login - and incorporates all of your favorites, characteristics, etc. that you have defined in that login. If you have setup strict authentication, the device should require some form of challenge-response authentication before it will accept any command on your behalf.
- the device should give a visual cue that it is in command mode, and is listening to you. Maybe a pair of semi-transparent eyes focused in your direction, positioned on the screen based on config preferences. The visual characteristics of the cue could be used to indicate who you have been authenticated as (to help avoid mistakes in a crowded room).
- audible and visual representations of the understood command should be (configurably) echoed so that you can verify it understood you accurately.
- identifications and commands from other individuals should be ignored for the duration of the command or session (e.g. "thank you Siri" or "good bye, Siri"), at which point the visual cues should clear from the screen.
Anyway, it's nice that they're trying to make it easier to use a computer. I suppose we'll have to see how well they designed the voice interface.