Researchers Say an AI-Powered Transcription Tool Used in Hospitals Has Major Flaws.
Researchers have highlighted a significant flaw in OpenAI’s AI-powered transcription tool, Whisper, which is being used in various industries worldwide. Despite OpenAI’s claims of near “human level robustness and accuracy,” Whisper is prone to generating fabricated text, known as hallucinations, which can include racial commentary, violent rhetoric, and even imagined medical treatments. This issue is particularly concerning as Whisper is being integrated into critical applications such as transcribing patients’ consultations with doctors, creating closed captioning for the Deaf and hard of hearing, and generating text in popular consumer technologies.
Experts have expressed concern over the widespread adoption of Whisper-based tools in medical centers, despite OpenAI’s warnings that the tool should not be used in “high-risk domains.” The Deaf and hard of hearing community is particularly vulnerable to faulty transcriptions, as they have no way to identify fabrications hidden among other text. Christian Vogler, who is deaf and directs Gallaudet University’s Technology Access Program, emphasized the risk of faulty transcriptions for this population.
The prevalence of hallucinations has led experts, advocates, and former OpenAI employees to call for federal regulations on AI. William Saunders, a San Francisco-based research engineer who quit OpenAI in February over concerns about the company’s direction, stated, “This seems solvable if the company is willing to prioritize it. It’s problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems.”
OpenAI has acknowledged the issue and is continually studying ways to reduce hallucinations. The company appreciates the researchers’ findings and incorporates feedback in model updates. However, many developers and researchers have noted that Whisper hallucinations are more frequent and concerning than those of other AI-powered transcription tools.
Whisper is integrated into some versions of OpenAI’s flagship chatbot ChatGPT and is a built-in offering in Oracle and Microsoft’s cloud computing platforms, serving thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages. In the last month alone, one recent version of Whisper was downloaded over 4.2 million times from the open-source AI platform HuggingFace.
Professors Allison Koenecke of Cornell University and Mona Sloane of the University of Virginia examined thousands of short snippets from TalkBank, a research repository hosted at Carnegie Mellon University, and determined that nearly 40% of the hallucinations were harmful or concerning because the speaker could be misinterpreted or misrepresented.
Over 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children’s Hospital Los Angeles, have started using a Whisper-based tool built by Nabla, which has offices in France and the U.S. This tool was fine-tuned on medical language to transcribe and summarize patients’ interactions. The full extent of the problem is difficult to discern, but experts urge OpenAI to address the flaw and consider stricter regulations to ensure the accuracy and reliability of AI-powered transcription tools.