Skip navigation.

UCSD Computer Vision

YouTube Video Text

Related Projects: GrOCR


YouTube Video Text (YVT) contains 30 videos. Each video has 15-second length, 30 frames per second, HD 720p quality and was collected from YouTube. The text content in the dataset can be divided into two categories, overlay text (e.g., captions, songs title, logos) and scene text (e.g. street signs, business signs, words on shirt).


Download link (434.8MB)
Related dataset: SVT