What this conversion actually changes
SRT and WebVTT share the same data model — numbered cues with a start time, an end time and text — so this is a re-serialisation, not a transformation. Nothing about your timing or text is reinterpreted.
input.srt output.vtt
WEBVTT
1
00:01:04,200 --> 00:01:06,840 1
<i>Who's there?</i> 00:01:04.200 --> 00:01:06.840
<i>Who's there?</i>
Four things change. A WEBVTT header line is added — required, and the most common reason a renamed
.srt fails in a web player. Millisecond separators switch from comma to period. The cue number is kept
as a WebVTT cue identifier (optional in VTT, but valid and useful as a styling hook). Output is encoded as UTF-8,
which the WebVTT specification mandates — if your source uses a legacy encoding, run it through the
encoding fixer first, or the converted text carries the same mojibake.
One change is invisible but matters: WebVTT parses cue text as markup, so bare & and
< characters are escaped to & and <. Players that would
otherwise swallow everything after a stray < render the line correctly.
When you need VTT
The HTML5 <track> element accepts only WebVTT — no browser loads an SRT file there. The same goes
for the players built on it: Video.js, Plyr, hls.js, most course and LMS platforms, and self-hosted web video
generally. SRT remains the default everywhere else — desktop players, smart TVs and media servers. Going the other
way instead? Use VTT to SRT.
What’s preserved, what’s adjusted
- Timing — copied digit for digit; only the separator changes. Hours above 99 are written through (VTT allows more than two hour digits).
<b>,<i>,<u>— valid in both formats, preserved as-is.<font color>— an SRT-only convention; converted to a WebVTT class where the colour is recognised, otherwise removed with a notice. Never silently.- Line breaks inside a cue — preserved. A blank line inside a cue (invalid in both formats) would end the cue in VTT, so it is collapsed to a single break, with a notice.
- Broken numbering — out-of-order or duplicate cue numbers are renumbered sequentially, with a notice. Timing is never reordered.
- Encoding artefacts — UTF-8 BOM handled; CRLF, LF and CR endings all accepted; output uses LF.
Limits
None meaningful. Conversion runs entirely in your browser, so there is no file-size cap beyond your device’s memory — a three-hour film with 5,000 cues converts in milliseconds. No queue, no account, and the page keeps working offline once it has loaded.