Video Question Answering model for Tumblr GIFs

Pytorch, HuggingFace, Kaggle

I lead the development of a Visual Question Answering (VQA) Model for GIFs using the Tumblr GIFs dataset for a intra-university competition. Through a comprehensive review of state-of-the-art literature, my team's main approach was inspired by Q-former which uses a LLM-head for open vocabulary question answering on visual data. Our alternative approach was a lightweight and less-finetuned verison of our main approach which surprisingly performed well in one-word question answering.

Weather Forecasting using Neural ODEs

Julia, DiffEqFlux.jl, Lux.jl

This study conducted under TA Anantha Padmanabhan efficiently integrated Differential Equations into neural networks, specifically employing Neural Ordinary Differential Equations (Neural ODEs) for weather forecasting on the Delhi Dataset. Our approach demonstrated superior accuracy, achieving competitive results with significantly less data and reduced training time compared to LSTM models. This highlights the potential of Neural ODEs for efficient and effective modeling in weather forecasting applications.

Visual Question Answering Model for Biomedical Images

Pytorch, HuggingFace, ScispaCy

I played a key role in the development of a Visual Question Answering (VQA) Model using the VQA-RAD dataset. Through a comprehensive review of state-of-the-art literature, our team fine-tuned both BiomedGPT and ViLT models, enhancing their performance. To improve dataset robustness, I implemented Text Data Augmentation techniques and leveraged ScispaCy for effective augmentation. Additionally, my contributions extended to finetuning the ViLT model and further refining data augmentation strategies, collectively enhancing the model's overall effectiveness in addressing visual questions.

Contrastive Study of Image Denoisers

Python, Tensorflow

As a part of my Exploratory Project under Prof. L. P. Singh I conducted a comparative analysis of three autoencoder-based image denoising models for document image refinement. The study focused on assessing the impact of encoder layer architectures. While two models utilized Conv2DTranspose layers, the third employed Upsampling layers. The evaluation revealed that models with Conv2DTranspose layers significantly outperformed others in denoising efficacy, providing valuable insights into optimal encoder layer choices for document image refinement.

Looking for Opportunities

Seeking opportunities in Deep Learning research, internships, or collaborations with a particular interest in 3D computer Vision, Neural Graphics and Graph Machine Learning. Open to partnerships with individuals, institutions, or companies, without geographical restrictions.

Please feel free to contact me for exchange of interesting ideas in Deep Learning.

Phone

(India) 84XXXXXX21

Address

Mumbai,
Maharashtra,
India