Whisper transcription & dictation Web UI / Web GUI

What is this Whisper UI?

Whisper Web UI is a tool that helps you transcribe voice recordings into text using the OpenAI Whisper transcription API. OpenAI’s Whisper API is one of quite a few APIs for transcribing audio, alongside the Google Cloud Speech-to-Text API, Rep.ai’s voice transcription APIs, Amazon Transcribe, and Microsoft Azure Speech-to-Text. It’s among the best for high-quality English transcriptions of standard conversational audio, based on OpenAI’s benchmarks.

How much does this cost?

The Whisper Web UI is powered by OpenAI’s Whisper API, which costs $0.006 per minute of transcription. You will need your own OpenAI API account and API key that lets OpenAI bill you directly.

If cost is an issue, there are other tools that let you transcribe Whisper locally on your computer, such as Aiko on Mac and Whishper, another web UI that runs on Linux for advanced users. However, if you’re on a mobile phone or a Windows computer, this Web UI often is the easiest and fastest way to transcribe text using the OpenAI Whisper API.

This web UI is provided completely free by Peter, one of the co-founders of Wanderlog, a travel and vacation planner app. Wanderlog also has tons of useful info, including the best restaurants in San Francisco.

What is the OpenAI Whisper model?

Whisper, an advanced speech recognition system, was trained on a vast dataset of 680,000 hours of diverse speech data from the internet. It’s great at handling different accents, background noise, and technical language.

The system breaks down 30-second audio pieces into text and can handle multiple languages, translating them into English. Unlike specialized models, Whisper’s strength lies in its adaptability across various speech patterns. With a significant amount of non-English data, it excels in translating speech to text, surpassing other top models. It aims to be an easy-to-use, highly accurate tool for developers to add voice features to a wide range of applications. You can read more about Whisper on OpenAI’s website: https://openai.com/research/whisper

What can this be used for?

Dictating notes or emails from lectures: Whisper often does a better job than your computer or phone’s regular transcription library.
Podcast transcription: Helps create accurate text versions of episodes or interviews.
Video subtitles: Generates precise captions, improving accessibility.
Meeting notes: Converts discussions into searchable written records.

Is this safe? What happens to my OpenAI API key?

All audio and transcriptions are sent securely between your device and OpenAI’s servers. Your OpenAI API key will also be safe: it’s stored in your browser and only sent (encrypted, using HTTPS) to OpenAI’s servers.

What are future plans for this project?

Some future plans are:

Allowing file uploading to transcribe text from audio and video files.
Letting you try transcribing your audio using multiple APIs so that you can figure out the best API for your use case and audio.
Rank APIs based on perceived quality.

Code on GitHub