export Faster on WebGPU

Transcribe Audio

Convert speech to text using OpenAI Whisper — runs entirely on your device. Pick a model: tiny (~80 MB, fast), base (~250 MB, recommended), or small (~600 MB, best). The chosen model downloads on first use, then works offline.

First run downloads ~238 MB. The model is cached after the first use, then runs offline. Manage downloads on the settings page.

Loading…

About Transcribe Audio

Transcribe Audio converts speech to text using OpenAI's Whisper running entirely on your device — your recording is never uploaded. Pick a model that fits your needs: tiny (~80 MB, fast), base (~250 MB, recommended), or small (~600 MB, best accuracy). The model downloads once on first use and then works offline, so private interviews and confidential calls stay on your machine.

Category: export
Input: Accepts: audio/wav, audio/mpeg, audio/mp4, audio/x-m4a, audio/aac, audio/ogg, audio/webm or audio/flac.
Output: Outputs: text/plain.
Cost: Free, runs in your browser
Memory: high
Install group: speech

Privacy: Transcribe Audio runs entirely on your device. Files you provide never leave your browser — no uploads, no server, no tracking. The page works offline once loaded.

Common uses

Turn a recorded interview into a text transcript without sending the audio to a cloud service
Caption a voice memo or lecture so you can search and quote it later
Draft subtitles from a video's extracted audio track
Transcribe a confidential client call where uploading the recording isn't allowed
Get a rough text version of a podcast episode for show notes
Convert a meeting recording into notes you can edit and summarize

Frequently asked questions

Which audio formats are supported?

WAV, MP3, MP4/M4A, AAC, OGG, WebM, and FLAC. The output is plain text.

Does my audio get uploaded?

No. Whisper runs in your browser on your device. After the model downloads once, transcription happens locally and works offline.

Which model should I choose?

Base (~250 MB) is the recommended balance. Tiny is fastest for quick drafts; small is the most accurate but the largest download.

Why is there a download the first time?

The chosen Whisper model (80–600 MB depending on size) is fetched once and cached, so later runs are fast and offline.

How accurate is it on noisy or accented speech?

It's good for clear speech; for noisy, accented, or difficult audio the hosted Pro version uses the larger Whisper-large-v3 model for better results.

Keywords

transcribe
speech
stt
whisper
audio
subtitles
voice

About Transcribe Audio

Common uses

Frequently asked questions

Keywords

Try next

Related tools