Question 1

How is this different from “Extract Video Subtitles”?

Accepted Answer

This tool uses OCR (optical character recognition) to “look” at the video frame by frame and recognize text burned into the picture—such as hardcoded subtitles, titles, danmaku, watermark text, and words on PPT/presentation screens. The “Extract Video Subtitles” tool instead uses speech recognition (ASR) to “transcribe” what is said. In short: use this tool for text on the screen, and the subtitle tool for spoken audio.

Question 2

How does it recognize on-screen text?

Accepted Answer

Based on the sampling interval you set, the tool captures the video frame by frame into images, then uses a local in-browser OCR engine to recognize the text in each frame, and finally deduplicates and merges the results into text segments with a timeline. The whole process runs in your browser and the video is never uploaded.

Question 3

Which text languages are supported?

Accepted Answer

Supported languages include Chinese (Simplified/Traditional), English, Japanese, Korean, French, German, Spanish, Portuguese, Italian, Russian, Arabic, Hindi, Vietnamese, Turkish, Indonesian and more. Choose the language matching the on-screen text before recognition; for mixed Chinese and English, pick the “Chinese + English” option for better results.

Question 4

How do I choose the sampling interval and recognition area?

Accepted Answer

A smaller interval yields more complete results but frame-by-frame OCR is slower, so for long videos try a 2–5 second interval first. If the text is concentrated at the bottom of the frame (typical hardcoded subtitles), setting the recognition area to “Bottom subtitle area only” filters out other distractions, speeds things up and improves accuracy; otherwise use “Entire frame”.

Question 5

Will my video files be uploaded to a server?

Accepted Answer

No. Video decoding, frame capture and OCR text recognition all run locally in your browser; the video file is never uploaded to any server. The recognition engine is downloaded from a CDN and cached in your browser on first use, then reusable offline.

Question 6

What if the results aren’t accurate?

Accepted Answer

OCR accuracy depends on the clarity, size and contrast of the on-screen text. If results aren’t ideal, try: confirming the right language, using a smaller sampling interval, using “Bottom subtitle area only” for bottom subtitles, or first sharpening the video with our other tools. It’s a good idea to proofread the exported results.

Extract On-Screen Text from Video Online

OCR on-screen text recognition

Selectable area + custom interval

Local processing protects your privacy

Use cases for extracting on-screen video text

Content organization & learning

Creation & office

How to Use

About the on-screen video text extractor

Frequently Asked Questions