How to Add Captions to a Video (Step-by-Step, 2026)

To add captions to a video, transcribe the audio into a timed caption file (SRT or VTT), then upload that file to your video player or burn it into the video. With PlainScribe you upload a file up to 200MB, get a transcript at up to 99% accuracy for $0.067/min ($4/hour), and export ready-to-use SRT/VTT in minutes — no subscription, no credit card for the first 30 minutes.

TL;DR

  • Fastest path: auto-transcribe, then export SRT or VTT — PlainScribe does this at $0.067/min ($4/hour) with up to 99% accuracy.
  • Two delivery options: closed captions (a separate SRT/VTT file viewers toggle on) or open captions (text burned permanently into the video).
  • Pay only for minutes you process — no subscription, no per-seat fee, $10 minimum (~150 minutes of credit). First 30 minutes are free, no card.
  • Files auto-delete after 7 days and uploads support MP4, MOV, WebM, MKV and more, so you caption straight from the raw video.
  • Always review the draft for names, jargon, and numbers before you publish — automation gets you 95%+ of the way, your edit gets the rest.

What you need before you start

You need three things: the video file (or its extracted audio), a transcription tool, and a target player or editor. PlainScribe accepts MP4, MOV, WebM, MKV, AAC, M4A, MP3, WAV, FLAC and OGG up to 200MB on the web, so you usually upload the video itself — no separate audio extraction step.

Decide your caption type first, because it changes the last step:

  • Closed captions ship as a sidecar file (.srt or .vtt). Viewers turn them on/off. Best for YouTube, Vimeo, web players, and accessibility compliance.
  • Open captions are baked into the pixels and can't be turned off. Best for muted social-feed autoplay (Instagram, TikTok, LinkedIn). See closed vs open captions to choose.

Step-by-step: add closed captions with an SRT/VTT file

  1. Upload the video. Go to PlainScribe and drop in your file (up to 200MB). Languages are auto-detected across 47 supported languages, so you don't pick one manually.
  2. Let it transcribe. The AI returns a timestamped transcript at up to 99% accuracy. A one-hour video costs $4 ($0.067/min). You're emailed when it's done.
  3. Review and edit. Fix proper nouns, technical terms, and numbers — the parts automated speech recognition misses most. Keep lines short (≈32–42 characters) for on-screen readability.
  4. Export SRT or VTT. Download the caption file. SRT works almost everywhere; VTT is the web/HTML5 standard. Not sure which? See SRT vs VTT.
  5. Attach it to your video. On YouTube: Subtitles → Add → Upload file. On Vimeo: video settings → Advanced → upload the caption track. On a website: add a <track> element pointing to your .vtt.

Step-by-step: add open (burned-in) captions

  1. Transcribe and export an SRT from PlainScribe as above.
  2. Open your video editor (CapCut, Premiere, DaVinci Resolve, etc.).
  3. Import the SRT as a subtitle/caption track — the timings come in automatically.
  4. Style the text (font, size, outline, position) so it stays readable on any background.
  5. Render/export the video with captions burned in. The text is now permanent.

For a subtitle-specific walkthrough, see the sibling guide on how to make subtitles.

Why captions are worth the effort

  • Accessibility: captions make video usable for deaf and hard-of-hearing viewers, and are legally expected in many contexts.
  • Silent autoplay: most social video plays muted by default; captions keep the message intact.
  • SEO: search engines can't watch video but can read a caption track, so captions surface your content.
  • Comprehension: they help non-native speakers and clarify accents, jargon, and noisy audio.

Captioning options compared

| Method | Speed | Cost | Accuracy | Best for | |--------|-------|------|----------|----------| | Manual typing | Slowest | Your time | Highest (with effort) | Short clips, exact scripts | | YouTube auto-captions | Fast | Free | Mixed | Rough draft you'll edit | | PlainScribe (AI + your edit) | Fast | $0.067/min ($4/hr) | Up to 99% | Most creators, any platform | | Human service (e.g. Rev) | Slow | $1.50/min | Highest | Legal/medical, verbatim |

Verdict: for the vast majority of videos, AI transcription you lightly edit is the sweet spot — Rev's human service is ~22x the per-minute cost of PlainScribe, and free auto-captions usually need so much cleanup that you save little. Compare the full field on the pricing page and the comparison hub.

FAQs

What file format do I need to add captions to a video? Use SRT or VTT. SRT (.srt) is supported by almost every platform and editor; VTT (.vtt) is the standard for HTML5 web players. PlainScribe exports both, plus TXT and CSV.

How long does it take to caption a video? Transcription itself takes a few minutes for a typical video; you're emailed when it's ready. Your manual review (fixing names and terms) usually takes a fraction of the video's runtime.

Can I add captions to a video for free? You can start free: PlainScribe gives 30 minutes with no credit card. YouTube's auto-captions are also free but usually need heavy editing. After the free minutes, PlainScribe is pure pay-as-you-go at $0.067/min.

How accurate are AI-generated captions? Up to 99% on clean audio. Background noise, heavy accents, and specialized vocabulary lower accuracy, which is why a quick human review before publishing is recommended.

Do captions help SEO? Yes. A caption track gives search engines readable text tied to your video, improving discoverability for the words spoken in it.

Start captioning in the next 10 minutes

Upload your video, get a timestamped transcript at up to 99% accuracy, and export SRT/VTT — all pay-as-you-go with no subscription. Try PlainScribe free with 30 minutes, no credit card, and explore more in our tools and use cases.

Transcribe, Translate & Summarize your files

Get started with 30 free minutes. No credit card required.