The goal of this project was to develop a natural language video search engine that could effectively search through large quantities of video data without relying on metadata like titles, descriptions, or audio transcriptions. The aim was to enable users to search for specific actions or scenes, such as closing and opening a door, and facilitate the comparison of these scenes across different videos.
The dataset for this particular tool is roughly ~30,000 scenes from imdbs top 250 movies.