Transcribe Audio
Convert speech to text using OpenAI Whisper — runs entirely on your device. Pick a model: tiny (~80 MB, fast), base (~250 MB, recommended), or small (~600 MB, best). The chosen model downloads on first use, then works offline.
About Transcribe Audio
Transcribe Audio converts speech to text using OpenAI's Whisper running entirely on your device — your recording is never uploaded. Pick a model that fits your needs: tiny (~80 MB, fast), base (~250 MB, recommended), or small (~600 MB, best accuracy). The model downloads once on first use and then works offline, so private interviews and confidential calls stay on your machine.
- Category
- export
- Input
- Accepts: audio/wav, audio/mpeg, audio/mp4, audio/x-m4a, audio/aac, audio/ogg, audio/webm or audio/flac.
- Output
- Outputs: text/plain.
- Cost
- Free, runs in your browser
- Memory
- high
- Install group
- speech
Common uses
- Turn a recorded interview into a text transcript without sending the audio to a cloud service
- Caption a voice memo or lecture so you can search and quote it later
- Draft subtitles from a video's extracted audio track
- Transcribe a confidential client call where uploading the recording isn't allowed
- Get a rough text version of a podcast episode for show notes
- Convert a meeting recording into notes you can edit and summarize
Frequently asked questions
Which audio formats are supported?
WAV, MP3, MP4/M4A, AAC, OGG, WebM, and FLAC. The output is plain text.
Does my audio get uploaded?
No. Whisper runs in your browser on your device. After the model downloads once, transcription happens locally and works offline.
Which model should I choose?
Base (~250 MB) is the recommended balance. Tiny is fastest for quick drafts; small is the most accurate but the largest download.
Why is there a download the first time?
The chosen Whisper model (80–600 MB depending on size) is fetched once and cached, so later runs are fast and offline.
How accurate is it on noisy or accented speech?
It's good for clear speech; for noisy, accented, or difficult audio the hosted Pro version uses the larger Whisper-large-v3 model for better results.
Keywords
- transcribe
- speech
- stt
- whisper
- audio
- subtitles
- voice