Microsoft’s Speech Recognition Tech Achieves Human Parity--Sort Of

Status
Not open for further replies.
"... we're living in a time when machines are beginning to truly understand humans and the world around us."
One word too far, truly.
 
“We’ve reached human parity,” said Xuedong Huang, the company’s chief speech scientist. “This is an historic achievement.”

A typo in a quote of a scientist talking about word error ratings. So meta.
 


That depends on your audio hardware/software more than anything. For example on a PC, it would depend on the type and quality of the microphone / mic array, the sound card, audio drivers, recording software, etc. There's a couple of places where there's opportunities for noise cancellation, depending on the gear and ware used. The result gets handed to this translation software, garbage in garbage out - you have to feed it good audio for it to do it's job. The situation isn't all that different for a smartphone. Unfortunately the iPhone probably wouldn't do the best job compared to a smartphone with a HAAC twin membrane quad-mic array.
 
Microsoft's AI driven voice recognition has left all rivals in the dust. Great work by Dr. Xuedong Huang's speech team.
 
Good job digging into the error rates, Lucian.

This is the question I had. How much compute does it use? It's not a small detail whether this requires a long time on a big GPU, or whether it can run on a smartphone in realtime. If too much compute is required, then this won't be deployed in most real-world uses cases for years.

BTW, humans are still way more energy efficient.
 
It would be, but where's the error?
 
Status
Not open for further replies.