export Faster on WebGPU

Transcribe Audio

Convert speech to text using OpenAI Whisper — runs entirely on your device. Pick a model: tiny (~80 MB, fast), base (~250 MB, recommended), or small (~600 MB, best). The chosen model downloads on first use, then works offline.

First run downloads ~238 MB. The model is cached after the first use, then runs offline. Manage downloads on the settings page.
Loading…

About Transcribe Audio

Transcribe Audio converts speech to text using OpenAI's Whisper running entirely on your device — your recording is never uploaded. Pick a model that fits your needs: tiny (~80 MB, fast), base (~250 MB, recommended), or small (~600 MB, best accuracy). The model downloads once on first use and then works offline, so private interviews and confidential calls stay on your machine.

Category
export
Input
Accepts: audio/wav, audio/mpeg, audio/mp4, audio/x-m4a, audio/aac, audio/ogg, audio/webm or audio/flac.
Output
Outputs: text/plain.
Cost
Free, runs in your browser
Memory
high
Install group
speech
Privacy: Transcribe Audio runs entirely on your device. Files you provide never leave your browser — no uploads, no server, no tracking. The page works offline once loaded.

Common uses

  • Turn a recorded interview into a text transcript without sending the audio to a cloud service
  • Caption a voice memo or lecture so you can search and quote it later
  • Draft subtitles from a video's extracted audio track
  • Transcribe a confidential client call where uploading the recording isn't allowed
  • Get a rough text version of a podcast episode for show notes
  • Convert a meeting recording into notes you can edit and summarize

Frequently asked questions

Which audio formats are supported?

WAV, MP3, MP4/M4A, AAC, OGG, WebM, and FLAC. The output is plain text.

Does my audio get uploaded?

No. Whisper runs in your browser on your device. After the model downloads once, transcription happens locally and works offline.

Which model should I choose?

Base (~250 MB) is the recommended balance. Tiny is fastest for quick drafts; small is the most accurate but the largest download.

Why is there a download the first time?

The chosen Whisper model (80–600 MB depending on size) is fetched once and cached, so later runs are fast and offline.

How accurate is it on noisy or accented speech?

It's good for clear speech; for noisy, accented, or difficult audio the hosted Pro version uses the larger Whisper-large-v3 model for better results.

Keywords

  • transcribe
  • speech
  • stt
  • whisper
  • audio
  • subtitles
  • voice

Try next