👈

Whisper Sweet Nothings

I'm documenting a software platform from the developer point of view, but first needed to review things from the end-user perspective. There is a large collection of instruction videos.

Rather than just watching them all through, I'd like to get a transcription first.

Whisper can do speech-to-text with very high accuracy.

$ find $(pwd) -type f -name "*.mp4" -exec sh -c 'echo "{}" && cd "$(dirname "{}")" && whisper "{}" --output_format txt' \;

However, Whisper is tuned for producing subtitles, so the resulting files had line breaks all the way through them.

These can be stitched back together with a short script that uses tr.

unsplit

#!/bin/sh
tr -s ' \t\n' ' ' < "$1" > "$1.new"
mv "$1.new" "$1"

This is invoked with:

$ find . -type f -name "*.txt" -exec unsplit {} \;

Beware: this acts on every text file in the current directory and all sub-directories.

This file was updated at 2025-03-01 19:15:47