How To How to transcribe an audio stream from a Raspberry PI microphone in real-time.

gektor650 · 2024-08-11T12:57:05-0400

I tested OpenAI Whisper audio transcription models on a Raspberry Pi 5. The main goal was to understand whether a Raspberry Pi can transcribe audio from a microphone in real time.

Whisper can run on a CPU or Nvidia only, so I will use a CPU only.

r/rasberrypi - Testing OpenAI Whisper on a Raspberry PI 5

I tested on a Raspberry PI with only 4GB of memory, so `medium` and `large` models were out of the scope.

Also, you can watch the transcription process in action:

Whisper setup

The setup is trivial. Just several commands and it's ready to be used:

Code:

#!/bin/bash

sudo apt update
sudo apt-get install -y ffmpeg sqlite3  portaudio19-dev python3-pyaudio

#
pip install numpy==1.26.4 --break-system-packages
pip install -U openai-whisper --break-system-packages
pip install pyaudio --break-system-packages
pip install pydub --break-system-packageshttps://github.com/openai/whisper

Audio Transcription Process

Audio is recorded with a USB Microphone.
Audio Stream is written to a WAV file.
Every 10 seconds, I start a new file and add the current WAV to a transcription Queue.
AI process constantly grabs an item from the Queue and transcribes it.
AI process writes text to a file/database.

So, If the AI transcription process takes more than a chunk (10 seconds), a Raspberry PI will never finish the Queue. I tested different lengths of audio: 5, 10, 15, and 30 seconds—there weren't many differences, so we can assume that 10 seconds are fine.

Whisper usage

The code is straightforward. I send a WAV file path to a library and I receive the transcribed text as a result. I added time tracking for a better understanding of the performance of the library.

Python:

import whisper
from time_util import TimeUtil

class AiWhisper:

    _models = ["tiny.en", "base.en", "small.en", "medium.en"]

    _model = None

    def __init__(self, model_index: int = 0):
        TimeUtil.start("AiWhisper init")
        if len(self._models) < model_index:
            raise KeyError(f"Max model index is {len(self._models)}")
        print(f"AiWhisper init. Using {self._models[model_index]}")
        self._model = whisper.load_model(self._models[model_index])
        TimeUtil.end("AiWhisper init")

    def transcode(self, file_path: str):
        TimeUtil.start("AiWhisper transcode")
        result =self._model.transcribe(file_path, fp16=False, language='English')
        TimeUtil.end("AiWhisper transcode")
        return result["text"]

Small.EN Model

This model + OS consumed 2GB of memory, leaving two more free. Let's see how fast it worked:

It took x3 time for a small model to process an audio file. So, for 10-second chunks, the transcription process took ~30 seconds. In a few minutes, I had ten items waiting in the Queue. I found that running a live transcription with these timings is impossible.

Medium.EN Model

This model + OS consumed 850MB of memory, leaving 3.1GB more free. Let's see how fast it worked:

The transcription process took around ~10 seconds, sometimes less, sometimes more. Overall, it was slightly slower than it should have been. This time, I could have won if I had written and read memory rather than an SD card. However, I didn't try to tune the performance, making the experiment clean.

Tiny.EN Model

It's not a surprise that the smallest model is the fastest. OS + Whisper consumed ~700Mb of memory, leaving 3.3GB free.

There was significantly more transcribed text as a result for the same video:

And the performance was pretty descent:

The prescription process took ~half the recording time. For a 10-second WAV file, the transcription took 5 seconds, leaving the Queue empty.

The quality of the output text was also good:

In conclusion, live-time transcription can be done with a Raspberry PI 5 using OpenAI Whisper.

The source code:

https://github.com/Nerdy-Things/openai-whisper-raspberry-pi/tree/master/python

Search

How To How to transcribe an audio stream from a Raspberry PI microphone in real-time.

gektor650

Whisper setup

Audio Transcription Process

Whisper usage

Small.EN Model

Medium.EN Model

Tiny.EN Model

TRENDING THREADS

Latest posts

Moderators online

Share this page

How To How to transcribe an audio stream from a Raspberry PI microphone in real-time.

gektor650

Whisper setup​

Audio Transcription Process​

Whisper usage​

Small.EN Model​

Medium.EN Model​

Tiny.EN Model​

TRENDING THREADS

Latest posts

Moderators online

Share this page

Whisper setup

Audio Transcription Process

Whisper usage

Small.EN Model

Medium.EN Model

Tiny.EN Model