Arpit Bansal

I am a Research Scientist at Google, where I work on text+image-to-image tasks and vision encoders. Prior to joining Google, I earned my PhD in Computer Science from the University of Maryland, College Park, advised by Professor Tom Goldstein. I am particularly interested in a range of multi-modal vision tasks, including text-to-image generation, image editing, and image reasoning. My work in these areas employs both diffusion and autoregressive methods to tackle various challenges in image processing. Additionally, with the aim to build neural networks that can do logical extrapolation and learn problem-solving skills, I have built neural-networks that can learn algorithms and thus extrapolate from easy-to-hard tasks using recurrence. A complementary aspect of my research includes investigating the inherent behaviors and characteristics of neural networks in various scenarios. Prior to UMD, I received my Bachelors and Masters (Dual Degree) from IIT Kharagpur in Electrical Engineering. My CV is available here.

Along with research, I enjoy playing keyboard, and hiking.

Recent News!

Jan 31, 2025	I am now Dr. Bansal. Thank you Professor Tom Goldstein for your guidance!
Oct 1, 2024	Transformers Can Do Arithmetic is accepted to Neurips’24
Jun 1, 2024	Intern @ Llama (Meta) - Building image reasoning + image editing systems
Apr 10, 2024	Passed my Preliminary exam, I am a PhD Candidate now!
Jan 18, 2024	Universal Guidance is accepted to ICLR’24!
Sep 21, 2023	Cold Diffusion is accepted to Neurips’23!
Sep 1, 2023	Interned with AWS AI labs building Text-to-Image Synthesis Models!
Jan 26, 2023	3 papers accepted to ICLR’23!
Sep 15, 2022	Scalable Algorithm Synthesis is accepted to Neurips’22!
Sep 6, 2022	Cold-Diffusion hits 400+ stars on github!