ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models

Pytorch, Hugging Face

ViBe has been accepted as a NAACL'25 Workshop paper at TrustNLP'25 !

ViBe offers a unique resource for evaluating the reliability of T2V models and provides a foundation for improving hallucination detection and mitigation in video generation. We establish classification as a baseline and present various ensemble classifier configurations, with the TimeSFormer + CNN combination yielding the best performance, achieving 0.345 accuracy and 0.342 F1 score. This benchmark aims to drive the development of robust T2V models that produce videos more accurately aligned with input prompts.

GSL- Graph Structure Learning for Heterophilic Graphs

Pytorch, Pytorch Geomteric

Official Pytorch Implementation of my research paper 'EXPLORING ADAPTIVE STRUCTURE LEARNING FOR HETEROPHILIC GRAPHS' .

FHE compatible Invisible Image watermarking system

ConcreteML, Pytorch, PyWavelets

This system employs Quantization Aware Training (QAT) to train a (Fully Homomorphic Encryption) FHE-compatible pytorch model and capable of watermarking images invisibly and while still being privacy-preserving for the user. Additionally the pytorch model used is developed from scratch by implementing a SOTA research paper.

Video Question Answering model for Tumblr GIFs

Pytorch, HuggingFace, Kaggle

I lead the development of a Visual Question Answering (VQA) Model for GIFs using the Tumblr GIFs dataset for a intra-university competition. Through a comprehensive review of state-of-the-art literature, my team's main approach was inspired by Q-former which uses a LLM-head for open vocabulary question answering on visual data. Our alternative approach was a lightweight and less-finetuned verison of our main approach which surprisingly performed well in one-word question answering.

Weather Forecasting using Neural ODEs

Julia, DiffEqFlux.jl, Lux.jl

This study conducted under TA Anantha Padmanabhan efficiently integrated Differential Equations into neural networks, specifically employing Neural Ordinary Differential Equations (Neural ODEs) for weather forecasting on the Delhi Dataset. Our approach demonstrated superior accuracy, achieving competitive results with significantly less data and reduced training time compared to LSTM models. This highlights the potential of Neural ODEs for efficient and effective modeling in weather forecasting applications.

Visual Question Answering Model for Biomedical Images

Pytorch, HuggingFace, ScispaCy

I played a key role in the development of a Visual Question Answering (VQA) Model using the VQA-RAD dataset. Through a comprehensive review of state-of-the-art literature, our team fine-tuned both BiomedGPT and ViLT models, enhancing their performance. To improve dataset robustness, I implemented Text Data Augmentation techniques and leveraged ScispaCy for effective augmentation. Additionally, my contributions extended to finetuning the ViLT model and further refining data augmentation strategies, collectively enhancing the model's overall effectiveness in addressing visual questions.

Contrastive Study of Image Denoisers

Python, Tensorflow

As a part of my Exploratory Project under Prof. L. P. Singh I conducted a comparative analysis of three autoencoder-based image denoising models for document image refinement. The study focused on assessing the impact of encoder layer architectures. While two models utilized Conv2DTranspose layers, the third employed Upsampling layers. The evaluation revealed that models with Conv2DTranspose layers significantly outperformed others in denoising efficacy, providing valuable insights into optimal encoder layer choices for document image refinement.

Looking for Opportunities

Seeking opportunities in Deep Learning research, internships, or collaborations with a particular interest in 3D computer Vision, Neural Graphics and Graph Machine Learning. Open to partnerships with individuals, institutions, or companies, without geographical restrictions.

Please feel free to contact me for exchange of interesting ideas in Deep Learning.

Phone

(India) 84XXXXXX21

Address

Mumbai,
Maharashtra,
India