YouTube video captions are timed text tracks you add to a video so viewers can read the spoken audio. You can rely on YouTube's auto-captions, type captions in Studio, or — for the cleanest result — upload your own SRT/VTT file. PlainScribe transcribes a video at up to 99% accuracy for $0.067/min (~$4/hour) and exports both formats for direct upload.
A caption is a synchronized text track tied to your video's timeline. Viewers switch it on with the CC button, and YouTube can host several tracks per video (e.g. English, Spanish, French). You can create those tracks three ways:
This guide focuses on the third method, because uploading a clean caption file is the fastest route to professional, accurate captions — and it's reusable across platforms.
You need a timestamped transcript saved as SRT or VTT. PlainScribe produces both:
At $0.067 per minute, a 60-minute video costs about $4 to caption — and there's no subscription, so you only pay for the minutes you actually process.
Repeat steps 3-5 for each additional language to host multiple caption tracks on one video.
| Method | Cost | Accuracy | Effort | Best for | |---|---|---|---|---| | YouTube auto-captions | Free | ~70-90% | Low (but edit-heavy) | Quick drafts | | Type in Studio | Free | Depends on you | Very high | Short videos | | Upload SRT/VTT (PlainScribe) | $0.067/min (~$4/hr) | Up to 99% | Low | Public/branded video |
Verdict: Auto-captions are a fine draft and manual typing is fine for a 90-second clip, but for anything longer or public-facing, uploading a 99%-accurate SRT/VTT is the clear winner. The file is reusable on Vimeo, your website, and social cuts — so $4 of transcription captions the video everywhere, not just YouTube.
Both work on YouTube. SRT (SubRip) is the most widely supported plain-text format and a safe default. VTT (WebVTT) is the web standard and supports extra styling and positioning, which matters more for HTML5 web players than for YouTube itself. PlainScribe exports both, so you can match whatever platform you're targeting. For a deeper breakdown, see our SRT vs VTT guide.
How do I add my own captions to a YouTube video? In YouTube Studio, go to Subtitles, select the video, click Add language, then Add under Subtitles, and choose Upload file with timing. Select your SRT or VTT file and publish. Create that file by transcribing the video in a tool like PlainScribe and exporting to SRT/VTT.
What is the best caption file format for YouTube? SRT (SubRip) is the most universally supported and a safe default. VTT (WebVTT) is the web standard with more styling options. YouTube accepts both, plus SBV. PlainScribe exports SRT and VTT, so you can pick either.
Can I add captions in multiple languages? Yes. YouTube hosts multiple caption tracks per video. Transcribe once with PlainScribe, translate the transcript into any of 47 languages, export an SRT/VTT per language, and upload each as a separate track in Studio.
How much does it cost to caption a YouTube video? With PlainScribe it's $0.067 per minute, so a 60-minute video costs about $4 to transcribe and export. There's no subscription — a $10 minimum purchase buys roughly 150 minutes of credit, and paid credits last a year.
Are uploaded captions better than YouTube's automatic ones? For quality, yes. Auto-captions run around 70-90% accuracy and need editing. An uploaded SRT/VTT transcribed at up to 99% accuracy gives you control over wording, punctuation, and timing, and the file is reusable across platforms.
Start with 30 free minutes, no credit card required. Upload your video, get a transcript at up to 99% accuracy, export SRT or VTT, and drop it into YouTube Studio. See the pricing, read the full overview in YouTube closed captions, master the auto-caption route in YouTube automatic closed captioning, or compare PlainScribe against Rev, Otter, and Sonix.
Get started with 30 free minutes. No credit card required.