I'm documenting a software platform from the developer point of view, but first needed to review things from the end-user perspective. There is a large collection of instruction videos.
Rather than just watching them all through, I'd like to get a transcription first.
Whisper can do speech-to-text with very high accuracy.
$ find $(pwd) -type f -name "*.mp4" -exec sh -c 'echo "{}" && cd "$(dirname "{}")" && whisper "{}" --output_format txt' \;
However, Whisper is tuned for producing subtitles, so the resulting files had line breaks all the way through them.
These can be stitched back together with a short script that uses tr
.
#!/bin/sh
tr -s ' \t\n' ' ' < "$1" > "$1.new"
mv "$1.new" "$1"
This is invoked with:
$ find . -type f -name "*.txt" -exec unsplit {} \;
Beware: this acts on every text file in the current directory and all sub-directories.
This file was updated at 2025-03-01 19:15:47