The WebVTT format, explained
WebVTT (Web Video Text Tracks) is the W3C standard for captions on the web. It is the only subtitle format the HTML5 <track> element accepts, and it can do far more than SubRip.
Unlike SRT, WebVTT has a real specification — a living standard maintained by the W3C. It was designed for HTML5
video, so it is the format you must use with the <track> element, and it underpins captioning in
Video.js, Plyr, hls.js and most browser-based players. It keeps SRT's simple cue model but adds positioning, styling,
comments, named cues, voices, chapters and metadata.
The header
Every WebVTT file begins with the signature line:
WEBVTT
WEBVTT - Optional title or description after a hyphen
The word WEBVTT must be the first thing in the file (optionally after a UTF-8 BOM). A file that doesn't
start with it is not valid WebVTT — which is exactly why renaming a .srt to .vtt doesn't
work, and why the SRT to VTT converter adds this line.
The cue
intro ← optional cue identifier
00:01:14.800 --> 00:01:17.200 line:90% align:center ← time + settings
<v Alex>Are you there?</v> ← payload, with a voice tag The differences from SRT are deliberate and important:
- Period decimal separator —
00:01:14.800, not a comma. This is mandated by the spec. - Optional hours — the hour field may be omitted for short content:
01:14.800is valid. - Cue identifier — an optional label on the line before the timestamp. It can be a name (a hook for styling) rather than a number; numbers are optional in VTT.
- Cue settings — everything after the timestamp positions the cue (see below).
- UTF-8 is mandatory — WebVTT is always UTF-8, full stop.
Cue settings: positioning
The space-separated tokens after the timestamp control where and how a cue is drawn:
line:— vertical position (a line number or percentage;line:0is the top).position:— horizontal position as a percentage.size:— the width of the cue box as a percentage.align:— text alignment:start,center,end,left,right.vertical:— vertical writing mode (rl/lr) for languages like Japanese.region:— attaches the cue to a named REGION block for scrolling roll-up captions.
None of this has an SRT equivalent, so converting VTT to SRT necessarily drops cue settings — our converter tells you which cues it affected rather than discarding them silently.
STYLE, NOTE and REGION blocks
Between cues, WebVTT allows three kinds of block:
STYLE
::cue { color: #fff; background: rgba(0,0,0,.6); }
::cue(.warn) { color: #f2c100; }
NOTE This is a comment. It never appears on screen.
REGION
id:speaker width:40% lines:3 scroll:up - STYLE — embeds CSS using the
::cuepseudo-element, so you can style captions directly in the file. - NOTE — a comment, ignored by the player; useful for translator notes and timing markers.
- REGION — defines a scrolling area for roll-up captions, common in live and broadcast-style output.
Inline payload tags
Inside a cue, WebVTT supports a richer tag set than SRT:
<b> <i> <u>— the familiar bold/italic/underline.<v Speaker>— a voice tag identifying who is speaking (also used for styling and accessibility).<c.classname>— a class span, targetable from a STYLE block or external CSS.<ruby>/<rt>— ruby annotations for East Asian typography.- Timestamp tags — inline
<00:01:15.500>markers that reveal text word-by-word, the basis of karaoke-style captions.
Because cue text is parsed as markup, literal & and < characters must be escaped as
& and < — another thing the SRT→VTT converter handles for you.
Chapters and metadata
WebVTT isn't only for captions. A <track kind="chapters"> uses the same cue syntax to define
chapter markers, and kind="metadata" carries arbitrary timed data (cue points, ad markers, lyrics) that
JavaScript reads but the browser never renders. The single, simple cue model is reused across all of these roles.
Serving WebVTT correctly
A valid file can still fail to load for two server-side reasons worth knowing:
- MIME type — the file must be served as
text/vtt. A misconfigured server sendingtext/plainwill be ignored by some browsers. - Same-origin / CORS — the
<track>element treats subtitles as a cross-origin resource; the file must come from the same origin or be sent with the right CORS headers.
If your converted file is valid but the player ignores it, the problem is almost always one of these — not the subtitles themselves. For everything else, the SRT vs VTT comparison lays out when to use which format, and the editor opens and exports both.