I am a Research Scientist at Facebook AI Research (FAIR) where I work on Computer Vision and Machine Learning. My research interest is in reducing the need for supervision in visual learning. I finished my PhD at the Robotics Institute at Carnegie Mellon University where I worked with Martial Hebert and Abhinav Gupta. My PhD Thesis was titled “Visual Learning with Minimal Human Supervision” for which I received the SCS Distinguished Dissertation Award (Runner Up) 2018. For my work in self-supervised learning, I was featured in the MIT Tech Review’s 35 innovators under 35 list (compiled globally across technological disciplines). You can hear me on Lex Fridman’s podcast for a fun overview of my work.
News
- [2023] Giving 2 talks at ICCV - BigMAC Workshop and Learning from Noisy and Unlabeled Data
- [2023] Mark Zuckerberg announced our recent project ImageBind
- [2022] Giving 3 workshop talks at ECCV - SSL What’s Next, Learning from Limited & Imperfect Data, and CV in the Wild. Lots of exciting work!
- [2022] Keynote talk at the Ghost Day ML Conference, 2022
- [2022] Omnivore: a single model for image, video and 3D classification. Performs better than modality-specific models
- [2021] Guest on the Lex Fridman Podcast
- [2021] Our CVPR 2021 paper (AVID) on Audio-Visual Self-supervised learning is a Best Paper Candidate.
- [2021] Co-wrote a blog on self-supervised learning with Yann LeCun [link].
- [2021] SEER scales self-supervised learning to billions of images.
- [2020] Our self-supervised technique called SwAV outperforms supervised pre-training on ALL considered transfer tasks and is the first method to do so.
Collaborators and Interns
- Saketh Rambhatla Postdoctoral researcher at Meta. PhD from University of Maryland, College Park
- Xudong Wang (University of California, Berkeley). Hosted at FAIR with Rohit Girdhar.
- Yue Zhao (University of Texas, Austin). Hosted at FAIR with Rohit Girdhar.
- Xingyi Zhou (University of Texas, Austin). Hosted at FAIR with Rohit Girdhar and Armand Joulin.
- Bowen Cheng (University of Illinois, Urbana Champaign). Hosted at FAIR with Rohit Girdhar and Alex Kirillov.
- Karan Desai (University of Michigan, Ann Arbor). Hosted at FAIR with Laurens van der Maaten.
- Zaiwei Zhang (University of Texas, Austin). Hosted at FAIR with Rohit Girdhar and Armand Joulin.
- Zhongzheng (Jason) Ren (University of Illinois, Urbana Champaign). Hosted at FAIR with Rohit Girdhar.
- Yuki Asano (University of Oxford). Hosted at FAIR with Armand Joulin, Piotr Bojanowski, and Andrea Vedaldi.
- Pedro Morgado (University of California, San Diego).
- Huaizu Jiang (University of Massachusetts, Amherst). Hosted at FAIR with Xinlei Chen and Marcus Rohrbach.
- Jyh-Jing Hwang (University of California, Berkeley). Hosted at FAIR with Laurens van der Maaten.
- Yan Wang (Cornell University). Hosted at FAIR with Laurens van der Maaten.
- Terrance de Vries (University of Guelph). Hosted at FAIR with Laurens van der Maaten.
Publications
MOST: Multiple Object localization with Self-supervised Transformers for object discovery.
ICCV 2023 (Oral)
MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses
ICML 2023
ImageBind: One Embedding Space To Bind Them All
CVPR 2023 (Highlighted paper)
The Hidden Uniform Cluster Prior in Self-Supervised Learning
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Seeing through the Human Reporting Bias:
Visual Classifiers from Noisy Human-Centric Labels
Visual Classifiers from Noisy Human-Centric Labels
CVPR 2016
Visual Storytelling
NAACL 2016
Applying artificial vision models to human scene understanding
Journal of Frontiers in Computational Neuroscience, 2015
Patents
Optimizing multi-class multimedia data classification using negative data
Optimizing multi-class image classification using patch features