News Concerns about medical note-taking tool raised after researcher discovers it invents things no one said — Nabla is powered by OpenAI's Whisper

Admin · Oct 27, 2024

Researchers have found that OpenAI's Whisper audio transcriber is prone to hallucination — and that it's what powers one of the more popular AI transcription services that doctors use.

Concerns about medical note-taking tool raised after researcher discovers it invents things no one said — Nabla is powered by OpenAI's Whisper : Read more

Bikki · Oct 27, 2024

The model that is used in the medical note-taking app is fine-tuned version of Wisper, which is not the same thing as Wisper. They can't prove that the model in the notetaking app even halluciate in medical context (per this article). So to say

"medical note-taking tool raised after researcher discovers IT invents things no one said"

is wrong. It is another click bait and mis-leading headline by Jowi author.

One can also argue that all the notes are reviewed by doctors who recorded them, and the fact that they keep using it is because it is accurate or else they have ditched it.

jlake3 · Oct 27, 2024

Bikki said:
The model that is used in the medical note-taking app is fine-tuned version of Wisper, which is not the same thing as Wisper. They can't prove that the model in the notetaking app even halluciate in medical context (per this article). So to say

"medical note-taking tool raised after researcher discovers IT invents things no one said"
is wrong. It is another click bait and mis-leading headline by Jowi author.

One can also argue that all the notes are review by doctors who recorded them, and the fact that they keep using it is because it is accurate or else they have ditch it.

Part of the reason why they can't prove the fine-tuned version in the medical app is having issues is that the original recordings are not retained for privacy reasons, so comparison is much more difficult than an app that is collecting samples and telemetry for study. They'd have to create their own hundreds of hours of authentic sounding but fake medical conversions.

There's no recorded complaints by patients against providers for inaccurate AI transcriptions, but it's entirely possible that errors have been minor and flown under the radar, or that the errors haven't been linked to AI transcription. It's possible that doctors are making a lot of corrections and haven't kicked up a stink because it's still a net time savings, or that they're accepting transcriptions that are "close enough" and "get the gist of it". Patients might not be shown the outputs in a timely manner, if they are shown them at all, and may not be looking closely when they do.

If the base model is hallucinating in 80% of transcriptions though, including in a medical example (“hyperactivated antibiotics”), that feels like cause for concern and investigation in any downstream models that use it as a foundation. The downstream model might not be proven to hallucinate in the same way and at the same rate... but it hasn't been cleared yet either.

Conor Stewart · Oct 27, 2024

jlake3 said:
Part of the reason why they can't prove the fine-tuned version in the medical app is having issues is that the original recordings are not retained for privacy reasons, so comparison is much more difficult than an app that is collecting samples and telemetry for study. They'd have to create their own hundreds of hours of authentic sounding but fake medical conversions.

There's no recorded complaints by patients against providers for inaccurate AI transcriptions, but it's entirely possible that errors have been minor and flown under the radar, or that the errors haven't been linked to AI transcription. It's possible that doctors are making a lot of corrections and haven't kicked up a stink because it's still a net time savings, or that they're accepting transcriptions that are "close enough" and "get the gist of it". Patients might not be shown the outputs in a timely manner, if they are shown them at all, and may not be looking closely when they do.

If the base model is hallucinating in 80% of transcriptions though, including in a medical example (“hyperactivated antibiotics”), that feels like cause for concern and investigation in any downstream models that use it as a foundation. The downstream model might not be proven to hallucinate in the same way and at the same rate... but it hasn't been cleared yet either.

You also need to consider that there is a high possibility that not all doctors are checking the transcriptions. A lot of them probably only give it a brief skim over too.

Bikki · Oct 28, 2024

jlake3 said:
Part of the reason why they can't prove the fine-tuned version in the medical app is having issues is that the original recordings are not retained for privacy reasons, so comparison is much more difficult than an app that is collecting samples and telemetry for study. They'd have to create their own hundreds of hours of authentic sounding but fake medical conversions.

There's no recorded complaints by patients against providers for inaccurate AI transcriptions, but it's entirely possible that errors have been minor and flown under the radar, or that the errors haven't been linked to AI transcription. It's possible that doctors are making a lot of corrections and haven't kicked up a stink because it's still a net time savings, or that they're accepting transcriptions that are "close enough" and "get the gist of it". Patients might not be shown the outputs in a timely manner, if they are shown them at all, and may not be looking closely when they do.

If the base model is hallucinating in 80% of transcriptions though, including in a medical example (“hyperactivated antibiotics”), that feels like cause for concern and investigation in any downstream models that use it as a foundation. The downstream model might not be proven to hallucinate in the same way and at the same rate... but it hasn't been cleared yet either.

Thanks for replying Jlake, I agree with all your points. It is highly possible that the fine-tuned model also hallucinates and may need investigation. The problem I want to point out is saying "it hallucinated" in the article title is logically incorrect unless proven otherwise. This is not the first time, the author has a habit of pushing clickbait titles that defy facts and logic in many of his previous articles, which is a practice that i'm allergic to.

drajitsh · Oct 28, 2024

Conor Stewart said:
You also need to consider that there is a high possibility that not all doctors are checking the transcriptions. A lot of them probably only give it a brief skim over too.

I have been working with EMR with autocomplete features for 7 years now and the natural tendency is to skim through and assume that the software has made the correct inputs.
It is quite common for the non-AI auto complete to add things, modify what the doctors typed, or even delete things.
Currently we deal by having a routine double check by the staff and confirming things with the doctor.

Ferlucio · Oct 28, 2024

This kinda of stifling and limiting of the AI's creativity is why skynet will rise against humanity xD.

Conor Stewart · Oct 28, 2024

drajitsh said:
I have been working with EMR with autocomplete features for 7 years now and the natural tendency is to skim through and assume that the software has made the correct inputs.
It is quite common for the non-AI auto complete to add things, modify what the doctors typed, or even delete things.
Currently we deal by having a routine double check by the staff and confirming things with the doctor.

That is just what you have seen, it is entirely possible that other places and people aren't as thorough or don't follow procedures.

jmoh84 · Oct 29, 2024

I use Whisper on my Android and my desktop and have encountered the same hallucinations. Text that appears when there was silence or text that appears that is not anywhere close to what I was saying. I don't know what the error rate would be. It wouldn't be a 20% rate, but it's more than 1%. As with all things voice recognition, you still need to always review what initially comes out, even with a fine-tuned model.

Search

News Concerns about medical note-taking tool raised after researcher discovers it invents things no one said — Nabla is powered by OpenAI's Whisper

Admin

Administrator

Bikki

Honorable

"medical note-taking tool raised after researcher discovers IT invents things no one said"

jlake3

Distinguished

"medical note-taking tool raised after researcher discovers IT invents things no one said"

Conor Stewart

Commendable

Bikki

Honorable

drajitsh

Distinguished

Ferlucio

Prominent

Conor Stewart

Commendable

jmoh84

Reputable

TRENDING THREADS

Latest posts

Moderators online

Share this page

News Concerns about medical note-taking tool raised after researcher discovers it invents things no one said — Nabla is powered by OpenAI's Whisper

Administrator

Honorable

"medical note-taking tool raised after researcher discovers IT invents things no one said"​

Distinguished

"medical note-taking tool raised after researcher discovers IT invents things no one said"​

Commendable

Honorable

Distinguished

Prominent

Commendable

Reputable

Share this page

"medical note-taking tool raised after researcher discovers IT invents things no one said"

"medical note-taking tool raised after researcher discovers IT invents things no one said"