![28_JYsjukWcn5.png](/proxy.php?image=https%3A%2F%2Fhackster.imgix.net%2Fuploads%2Fattachments%2F1741545%2F28_JYsjukWcn5.png%3Fauto%3Dcompress%252Cformat%26w%3D740%26h%3D555%26fit%3Dmax&hash=5dab76ac065f66fb23efa531bec4dee6)
I tested OpenAI Whisper audio transcription models on a Raspberry Pi 5. The main goal was to understand whether a Raspberry Pi can transcribe audio from a microphone in real time.
Whisper can run on a CPU or Nvidia only, so I will use a CPU only.
![r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5 r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5](/proxy.php?image=https%3A%2F%2Fpreview.redd.it%2Ftesting-openai-whisper-on-a-raspberry-pi-5-v0-7j9kmm3cyghd1.png%3Fwidth%3D1280%26format%3Dpng%26auto%3Dwebp%26s%3D6c75eed3625a7c5152bbe6a4ab4e9498c5698edd&hash=cce240fd241c6d4d415bbc71368fe9e8)
I tested on a Raspberry PI with only 4GB of memory, so `medium` and `large` models were out of the scope.
Also, you can watch the transcription process in action:
Whisper setup
The setup is trivial. Just several commands and it's ready to be used:
Code:
#!/bin/bash
sudo apt update
sudo apt-get install -y ffmpeg sqlite3 portaudio19-dev python3-pyaudio
#
pip install numpy==1.26.4 --break-system-packages
pip install -U openai-whisper --break-system-packages
pip install pyaudio --break-system-packages
pip install pydub --break-system-packageshttps://github.com/openai/whisper
Audio Transcription Process
![r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5 r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5](/proxy.php?image=https%3A%2F%2Fpreview.redd.it%2Ftesting-openai-whisper-on-a-raspberry-pi-5-v0-16e4zu4dzghd1.png%3Fwidth%3D1920%26format%3Dpng%26auto%3Dwebp%26s%3D3fc72ee973389aaecd8c064dcddfb58be53aa381&hash=ec2b61b77ddad14a33471169247eb085)
- Audio is recorded with a USB Microphone.
- Audio Stream is written to a WAV file.
- Every 10 seconds, I start a new file and add the current WAV to a transcription Queue.
- AI process constantly grabs an item from the Queue and transcribes it.
- AI process writes text to a file/database.
Whisper usage
The code is straightforward. I send a WAV file path to a library and I receive the transcribed text as a result. I added time tracking for a better understanding of the performance of the library.
Python:
import whisper
from time_util import TimeUtil
class AiWhisper:
_models = ["tiny.en", "base.en", "small.en", "medium.en"]
_model = None
def __init__(self, model_index: int = 0):
TimeUtil.start("AiWhisper init")
if len(self._models) < model_index:
raise KeyError(f"Max model index is {len(self._models)}")
print(f"AiWhisper init. Using {self._models[model_index]}")
self._model = whisper.load_model(self._models[model_index])
TimeUtil.end("AiWhisper init")
def transcode(self, file_path: str):
TimeUtil.start("AiWhisper transcode")
result =self._model.transcribe(file_path, fp16=False, language='English')
TimeUtil.end("AiWhisper transcode")
return result["text"]
Small.EN Model
This model + OS consumed 2GB of memory, leaving two more free. Let's see how fast it worked:![r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5 r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5](/proxy.php?image=https%3A%2F%2Fpreview.redd.it%2Ftesting-openai-whisper-on-a-raspberry-pi-5-v0-97xg2dr31hhd1.png%3Fwidth%3D1920%26format%3Dpng%26auto%3Dwebp%26s%3D0d06bde4ee404b7055da1394f8ff6b3821b12e36&hash=92d44a801999878842d55f083aa4df26)
It took x3 time for a small model to process an audio file. So, for 10-second chunks, the transcription process took ~30 seconds. In a few minutes, I had ten items waiting in the Queue. I found that running a live transcription with these timings is impossible.
Medium.EN Model
This model + OS consumed 850MB of memory, leaving 3.1GB more free. Let's see how fast it worked:![r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5 r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5](/proxy.php?image=https%3A%2F%2Fpreview.redd.it%2Ftesting-openai-whisper-on-a-raspberry-pi-5-v0-tt265yki2hhd1.png%3Fwidth%3D1920%26format%3Dpng%26auto%3Dwebp%26s%3D50b26455ce6e2aeb2002dd1fa4da9dc2b95c86fd&hash=50089072bab7afe82e7a68d2e455ca03)
The transcription process took around ~10 seconds, sometimes less, sometimes more. Overall, it was slightly slower than it should have been. This time, I could have won if I had written and read memory rather than an SD card. However, I didn't try to tune the performance, making the experiment clean.
Tiny.EN Model
It's not a surprise that the smallest model is the fastest. OS + Whisper consumed ~700Mb of memory, leaving 3.3GB free.There was significantly more transcribed text as a result for the same video:
![r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5 r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5](/proxy.php?image=https%3A%2F%2Fpreview.redd.it%2Ftesting-openai-whisper-on-a-raspberry-pi-5-v0-9v3pdazd3hhd1.png%3Fwidth%3D1920%26format%3Dpng%26auto%3Dwebp%26s%3Df8d7866f5153a9b3618792c5a162f50246efae14&hash=f81e4da46c9cfcda93f9243a861cd4fc)
And the performance was pretty descent:
![r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5 r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5](/proxy.php?image=https%3A%2F%2Fpreview.redd.it%2Ftesting-openai-whisper-on-a-raspberry-pi-5-v0-l84gvrbh3hhd1.png%3Fwidth%3D1920%26format%3Dpng%26auto%3Dwebp%26s%3D6d9877003bf0bde55022c6d3727385f443ef1dc7&hash=fc2d8f9abb6472aa739a797adfeec7c8)
The prescription process took ~half the recording time. For a 10-second WAV file, the transcription took 5 seconds, leaving the Queue empty.
The quality of the output text was also good:
![r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5 r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5](/proxy.php?image=https%3A%2F%2Fpreview.redd.it%2Ftesting-openai-whisper-on-a-raspberry-pi-5-v0-0tkbp6ur3hhd1.png%3Fwidth%3D1920%26format%3Dpng%26auto%3Dwebp%26s%3D75e700f38bed1480c0aa6f8b13fb83ea96662889&hash=de694b5384612b938fe40ffc0a5f740e)
In conclusion, live-time transcription can be done with a Raspberry PI 5 using OpenAI Whisper.
![r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5 r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5](/proxy.php?image=https%3A%2F%2Fpreview.redd.it%2Ftesting-openai-whisper-on-a-raspberry-pi-5-v0-7azdwm2y3hhd1.png%3Fwidth%3D1920%26format%3Dpng%26auto%3Dwebp%26s%3Ddf0f3cd95df7f49faf10e852a22101be4f1096f2&hash=cc631ad7ded7620c5c64e3457438aba0)
The source code:
https://github.com/Nerdy-Things/openai-whisper-raspberry-pi/tree/master/python