Video Question Answering model for Tumblr GIFs
Pytorch, HuggingFace, Kaggle
I lead the development of a Visual Question Answering (VQA) Model for GIFs using the Tumblr GIFs dataset for a intra-university competition. Through a comprehensive review of state-of-the-art literature, my team's main approach was inspired by Q-former which uses a LLM-head for open vocabulary question answering on visual data. Our alternative approach was a lightweight and less-finetuned verison of our main approach which surprisingly performed well in one-word question answering.