Microsoft’s Speech Recognition Tech Achieves Human Parity--Sort Of

Status
Not open for further replies.

Icepilot

Distinguished
Nov 12, 2014
18
3
18,515
"... we're living in a time when machines are beginning to truly understand humans and the world around us."
One word too far, truly.
 

stuartturner34

Reputable
Jun 2, 2015
24
0
4,520
“We’ve reached human parity,” said Xuedong Huang, the company’s chief speech scientist. “This is an historic achievement.”

A typo in a quote of a scientist talking about word error ratings. So meta.
 

alextheblue

Distinguished


That depends on your audio hardware/software more than anything. For example on a PC, it would depend on the type and quality of the microphone / mic array, the sound card, audio drivers, recording software, etc. There's a couple of places where there's opportunities for noise cancellation, depending on the gear and ware used. The result gets handed to this translation software, garbage in garbage out - you have to feed it good audio for it to do it's job. The situation isn't all that different for a smartphone. Unfortunately the iPhone probably wouldn't do the best job compared to a smartphone with a HAAC twin membrane quad-mic array.
 

Kafantaris

Reputable
Mar 2, 2015
1
0
4,510
Microsoft's AI driven voice recognition has left all rivals in the dust. Great work by Dr. Xuedong Huang's speech team.
 

bit_user

Titan
Ambassador
Good job digging into the error rates, Lucian.

This is the question I had. How much compute does it use? It's not a small detail whether this requires a long time on a big GPU, or whether it can run on a smartphone in realtime. If too much compute is required, then this won't be deployed in most real-world uses cases for years.

BTW, humans are still way more energy efficient.
 

bit_user

Titan
Ambassador
It would be, but where's the error?
 
Status
Not open for further replies.