Ishan Misra
Research Scientist @ GenAI (Meta)
I am a Research Scientist at Meta where I work on Computer Vision and Machine Learning.
My research interest is in training multimodal machine learning models at scale.
Education
Ph.D. in RoboticsCarnegie Mellon University, Pittsburgh, USA
Advisor:
Martial Hebert , Co-advisor:
Abhinav Gupta
Committee:
Martial Hebert ,
Abhinav Gupta ,
Deva Ramanan ,
Alyosha Efros ,
Andrew Zisserman
CMU School of Computing Science Distinguished Dissertation Award (Runner Up)
M.S. in Robotics
Carnegie Mellon University, Pittsburgh, USA
Advisor:
Martial Hebert
Siebel Scholarship
IIIT, India
Advisor:
P J Narayanan
Rank 1/150 in graduating class
Gold Medalist for Computer Science (Summa Cum Laude)
Selected Honors and Awards
2024
Contributions to self-supervised learning in computer vision. One of 11 honorees across all global alumni
2022
Contributions to self-supervised learning in computer vision. Compiled globally across the world in all technology areas.
2021
Paper: "Audio Visual Instance Discrimination"
2018
PhD Thesis: "Visual Learning with Minimal Human Supervision"
2014
Selected from all graduate students at the School of Computer Science, CMU
2014
Paper: Data Driven Exemplar Model Selection
Press
November 2023
Emu Video: Text-to-Video Generation
April 2023
DINOv2: State-of-the-art computer vision models with self-supervised learning
May 2023
ImageBind: a new way to link AI across senses
March 2021
Self-supervised Learning: The dark matter of intelligence
March 2021
SEER: The start of a more powerful, flexible, and accessible era for computer vision
Industry Research Experience
Feb 2023 - Present
Meta (Facebook), Seattle, WA, USA
Research Scientist, Generative AI
Lead the research team for video generation and understanding
Sep 2018 - Feb 2023
Meta (Facebook), New York, NY, USA
Research Scientist, FAIR
Fundamental research on self-supervised and multimodal representation learning
Summer 2017
Meta (Facebook), New York, NY, USA
Research Intern, FAIR
Mentor:
Rob Fergus ,
Ross Girshick ,
Laurens van der Maaten
Designed a visual question answering system that acquires knowledge by identifying gaps in its knowledge and asking questions to humans to fill those gaps, published at CVPR'17
Summer 2015
Microsoft Research, Redmond, WA, USA
Research Intern, Computer Vision
Mentor:
Ross Girshick ,
Larry Zitnick ,
Margaret Mitchell ,
Jacob Devlin
Worked on vision-language systems for generating questions (published at ACL), storytelling datasets (published at NAACL), weakly supervised learning (published at CVPR)
Summer 2014
Microsoft Research, Redmond, WA, USA
Research Intern, Computer Vision
Mentor:
Xia Sheng Hua
Worked on improving image search systems using patch-based deep learning features (2 US patents granted)
Academic Research Experience
Summer 2012
INRIA, Paris, France
Research Intern, Ecole Centrale
Mentor:
Iasonas Kokkinos
Worked on shape from shading
Summer 2011
Yale, New Haven, CT, USA
Research Intern, Computer Science
Mentor:
Bryan Ford
Worked on deterministic distributed operating systems: writing bootloaders in assembly and parallel threading libraries.
Talks
What world priors do generative visual models learn?
October 2024
What makes Generative video models tick faster?
October 2024
What makes Generative video models tick?
October 2024
October 2024
Beyond pretty pictures: What's needed to make Generative Visual Models useful?
July 2024
Generative models for Computer Vision
July 2024
Improving generative models for vision: high quality videos and precise image control
March 2024
Generative Models for Multimodal Learning
February 2024
Emu Video: State of the Art Video Generation
January 2024
Using unlabeled data to scale representations across modalities
October 2023
SSL scales multimodal pretraining to more modalities and data
October 2023
June 2023
Machine Learning without Human Supervision
January 2023
General-purpose Visual Recognition Systems: Beyond a Single Modality and a Task
October 2022
General-purpose Visual Recognition Across Modalities with Limited Supervision
October 2022
Representation Learning Beyond a Single Dataset and Modality
October 2022
Self-supervised Learning
August 2022
Visual Recognition and Self-supervised Learning
August 2022
General purpose Vision Models
August 2022
Self-supervised Visual Learning
April 2022
Object Discovery using Transformers
2022
Can Machines Learn to See without Human Supervision?
2022
3D Recognition using Transformers
2021
Redundancy Reduction for Self-supervised Learning
2021
Learning Vision Models with Minimal Supervision
2021
Guest Lectures
Video Diffusion Models
2023
Guest Lecture: Deep Learning course by Abhinav Gupta at Carnegie Mellon University
Multimodal learning
2023
Guest Lecture: Deep Learning course by Abhinav Shrivastava at UMD, College Park
Self-supervised Learning in Vision
2023, 2022, 2021, 2020, 2019
Guest Lecture: Deep Learning course by Rob Fergus at NYU
2021, 2020
Guest Lecture: Deep Learning course by Yann LeCun at NYU
2022, 2021
Guest Lecture: Deep Learning course by Zsolt Kira at Georgia Tech
2022, 2020, 2019
Guest Lecture: Deep Learning course by Dhruv Batra at Georgia Tech
2021, 2020, 2019
Guest Lecture: Deep Learning course by Abhinav Shrivastava at UMD, College Park
Structure from Motion
2019
Guest Lecture: Computer Vision course by Rob Fergus at NYU
Collaborators and Interns
January 2023 - present
PhD from University of Maryland, College Park
Postdoctoral Researcher
Summer 2023
PhD, University of California, Berkeley
Internship on instance conditioned diffusion models. Co-hosted with Rohit Girdhar, Saketh Rambhatla.
Summer 2023
PhD, University of California, Berkeley
Internship on LLMs + diffusion models for visual instruction generation. Co-hosted with Rohit Girdhar.
Summer 2023
PhD, University of Texas, Austin
Internship on video diffusion models for editing. Co-hosted with Bichen Wu.
2022-2023
PhD, New York University
Visiting Researcher working on Home Robotics. Co-hosted with Soumith Chintala.
Apple PhD Fellowship 2023
Summer 2023
PhD, Columbia University
Internship on video diffusion models. Co-hosted with Laurens van der Maaten.
Summer 2022
PhD, University of California, Berkeley
Internship on self-supervised segmentation (published at CVPR'23). Co-hosted with Rohit Girdhar.
Summer 2022
PhD, University of Texas, Austin
Internship on LLMs + Video understanding (published at CVPR'23). Co-hosted with Rohit Girdhar.
NVIDIA Graduate Fellowship 2024-2025
Summer 2021
PhD, University of Texas, Austin
Internship on Open Vocabulary Object Detection (published at ECCV'22). Co-hosted with Armand Joulin, Rohit Girdhar.
Facebook Fellowship 2021
Now: Research Scientist at Google Research
Summer 2021
PhD, University of Illinois at Urbana-Champaign
Internship on transformers for pixel segmentation (published at CVPR'22)
Now: ML Scientist at Tesla
Summer 2021
PhD, University of Michigan at Ann Arbor
Internship on using scribbles for segmenting objects (published at BMVC'22). Co-hosted with Laurens van der Maaten.
Summer 2020
PhD, University of Texas, Austin
Internship on self-supervised learning using depth (published at ICCV'21). Co-hosted with Armand Joulin, Rohit Girdhar.
Now: Researcher at Cruise
Summer 2020
PhD, University of Illinois at Urbana-Champaign
Internship on weakly supervised learning for 3D detection (published at CVPR'21). Co-hosted with Rohit Girdhar.
Now: Research Scientist at Apple
Summer 2020
PhD, University of Oxford
Internship on self-supervised learning for object discovery. Co-hosted with Armand Joulin, Piotr Bojanowski, Andrea Vedaldi.
Now: Assistant Professor at University of Amsterdam
Summer 2019
PhD, University of California, San Diego
Internship on self-supervised learning for audiovisual learning (published at ECCV'20, best-paper finalist).
Now: Assistant Professor at University of Wisconsin-Madison (EECS)
Summer 2019
PhD, University of Massachusetts, Amherst
Internship on visual question answering (published at CVPR'20). Co-hosted with Xinlei Chen.
Now: Assistant Professor at Northeastern University
Summer 2019
PhD, University of California, Berkeley
Internship on anytime video recognition. Co-hosted with Laurens van der Maaten.
Now: Research Scientist at Waymo Research
Summer 2019
PhD, Cornell University
Internship on anytime image recognition. Co-hosted with Laurens van der Maaten.
Now: Research Scientist at Waymo Research
Summer 2019
PhD, Cornell University
Internship on bias in object recognition systems (published at CVPRW'2019). Co-hosted with Laurens van der Maaten.
Now: Research Scientist at Luma AI
2018
Masters in Robotics, Carnegie Mellon University
Worked on image augmentations for object detection (published at ICCV'2019)
Now: PhD Student at University of Michigan, Ann Arbor
2016
Masters in Robotics, Carnegie Mellon University
Worked on image augmentations for object detection (published at ICCV'2017)
Now: Research Scientist at Google Brain
Academic Service
Area Chair
IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR:
2021
,
2023
,
2024
European Conference on Computer Vision ECCV:
2024
Neural Information Processing Systems NeurIPS:
2022
,
2023
,
2024
International Conference on Learning Representations ICLR:
2023
Reviewing
IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR:
2015
,
2016
,
2017
,
2018
,
2019
,
2020
,
2022
IEEE/CVF International Conference on Computer Vision ICCV:
2015
,
2017
,
2019
,
2021
,
2023
European Conference on Computer Vision ECCV:
2016
,
2018
,
2020
,
2022
Neural Information Processing Systems NeurIPS:
2018
,
2019
,
2020
,
2021
International Conference on Learning Representations ICLR:
2019
,
2020
,
2021
Workshops
Tutorial on Self-Supervised Learning :
2023
Extreme Scale Vision :
2019
Vision & Language StoryTelling Workshop :
2017
Doctoral Thesis Committee
2024
Xudong Wang (University of California Berkeley)
2024
Alexandre Devillers (INRIA)
2023
Nirat Saini (University of Maryland, College Park)
Publications
A complete and updated list of my publications is available on my Google Scholar profile.
Movie Gen: A Cast of Media Foundation Models
The Movie Gen Team (Overall Tech Lead; Core Contributor)
Meta Research (arxiv). 2024.
The Llama 3 Herd of Models
The Llama3 Team (played role of a Core Contributor for video recognition)
arxiv (arxiv). 2024.
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Rohit Girdhar^ *,
Mannat Singh^ *,
Andrew Brown *,
Quentin Duval *,
Samaneh Azadi *,
Sai Saketh Rambhatla ,
Akbar Shah ,
Xi Yin ,
Devi Parikh ,
Ishan Misra
*
ECCV (ECCV). 2024.
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang ,
Bichen Wu ,
Jialiang Wang ,
Licheng Yu ,
Kunpeng Li ,
Yinan Zhao ,
Ishan Misra
,
Jia-Bin Huang ,
Peizhao Zhang ,
Peter Vajda ,
Diana Marculescu
CVPR (CVPR). 2024.
InstanceDiffusion: Instance-level Control for Image Generation
Xudong Wang ,
Trevor Darrell ,
Sai Saketh Rambhatla ,
Rohit Girdhar ,
Ishan Misra
CVPR (CVPR). 2024.
Generating Illustrated Instructions
Sachit Menon ,
Ishan Misra
,
Rohit Girdhar
CVPR (CVPR). 2024.
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
Xudong Wang ,
Ishan Misra
,
Ziyun Zheng ,
Rohit Girdhar ,
Trevor Darrell
CVPR (CVPR). 2024.
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh *,
Quentin Duval *,
Kalyan Vasudev Alwala *,
Haoqi Fan ,
Vaibhav Aggarwal ,
Aaron Adcock ,
Armand Joulin ,
Piotr Dollár ,
Christoph Feichtenhofer ,
Ross Girshick ,
Rohit Girdhar ,
Ishan Misra
ICCV (ICCV). 2023.
MOST: Multiple Object localization with Self-supervised Transformers for object discovery.
Sai Saketh Rambhatla ,
Ishan Misra
,
Rama Chellappa ,
Abhinav Shrivastava
ICCV (ICCV). 2023.
MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses
Yang Fu ,
Ishan Misra
,
Xiaolong Wang
ICML (ICML). 2023.
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar *,
Alaaeldin El-Nouby *,
Zhuang Liu ,
Mannat Singh ,
Kalyan Vasudev Alwala ,
Armand Joulin ,
Ishan Misra
*
CVPR (CVPR). 2023.
Cut and Learn for Unsupervised Object Detection and Instance Segmentation
Xudong Wang ,
Rohit Girdhar ,
Stella X. Yu ,
Ishan Misra
CVPR (CVPR). 2023.
Learning Video Representations from Large Language Models
Yue Zhao ,
Ishan Misra
,
Philipp Krahenbuhl ,
Rohit Girdhar
CVPR (CVPR). 2023.
The Hidden Uniform Cluster Prior in Self-Supervised Learning
Mahmoud Assran ,
Randall Balestriero ,
Quentin Duval ,
Florian Bordes ,
Ishan Misra
,
Piotr Bojanowski ,
Pascal Vincent ,
Michael Rabbat ,
Nicolas Ballas
ICLR (ICLR). 2023.
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar *,
Alaaeldin El-Nouby *,
Mannat Singh *,
Kalyan Vasudev Alwala *,
Armand Joulin ,
Ishan Misra
*
CVPR (CVPR). 2023.
Masked Siamese Networks for Label-Efficient Learning
Mahmoud Assran ,
Mathilde Caron ,
Ishan Misra
,
Piotr Bojanowski ,
Florian Bordes ,
Pascal Vincent ,
Armand Joulin ,
Michael Rabbat ,
Nicolas Ballas
ECCV (ECCV). 2022.
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou ,
Rohit Girdhar ,
Armand Joulin ,
Phillip Krahenbuhl ,
Ishan Misra
ECCV (ECCV). 2022.
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal ,
Quentin Duval ,
Isaac Seessel ,
Mathilde Caron ,
Ishan Misra
,
Levent Sagun ,
Armand Joulin ,
Piotr Bojanowski
Arxiv (Arxiv). 2022.
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar *,
Mannat Singh *,
Nikhila Ravi *,
Laurens van der Maaten ,
Armand Joulin ,
Ishan Misra
*
CVPR (CVPR). 2022.
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng ,
Ishan Misra
,
Alexander G. Schwing ,
Alexander Kirillov ,
Rohit Girdhar
CVPR (CVPR). 2022.
An End-to-End Transformer Model for 3D Object Detection
Ishan Misra
,
Rohit Girdhar ,
Armand Joulin
ICCV (ICCV). 2021.
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron ,
Hugo Touvron ,
Ishan Misra
,
Hervé Jégou ,
Julien Mairal ,
Piotr Bojanowski ,
Armand Joulin
ICCV (ICCV). 2021.
Self-Supervised Pretraining of 3D Features on any Point-Cloud
Zaiwei Zhang ,
Rohit Girdhar ,
Armand Joulin ,
Ishan Misra
ICCV (ICCV). 2021.
MDETR : Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath ,
Mannat Singh ,
Yann LeCun ,
Ishan Misra
,
Gabriel Synnaeve ,
Nicolas Carion
ICCV (ICCV). 2021.
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Pedro Morgado ,
Nuno Vasconcelos ,
Ishan Misra
CVPR (CVPR). 2021.
Robust Audio-Visual Instance Discrimination
Pedro Morgado ,
Ishan Misra
,
Nuno Vasconcelos
CVPR (CVPR). 2021.
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
Jure Zbontar *,
Li Jing *,
Ishan Misra
,
Yann LeCun ,
Stéphane Deny
ICML (ICML). 2021.
3D Spatial Recognition without Spatially Labeled 3D
Zhongzheng Ren ,
Ishan Misra
,
Alexander G. Schwing ,
Rohit Girdhar
CVPR (CVPR). 2021.
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron ,
Ishan Misra
,
Julien Mairal ,
Priya Goyal ,
Piotr Bojanowski ,
Armand Joulin
NeurIPS (NeurIPS). 2020.
Self-Supervised Learning of Pretext-Invariant Representations
Ishan Misra
,
Laurens van der Maaten
CVPR (CVPR). 2020.
ClusterFit: Improving Generalization of Visual Representations
Xueting Yan *,
Ishan Misra
*,
Abhinav Gupta ,
Deepti Ghadiyaram *,
Dhruv Mahajan *
CVPR (CVPR). 2020.
In Defense of Grid Features for Visual Question Answering
Huaizu Jiang ,
Ishan Misra
,
Marcus Rohrbach ,
Erik Learned-Miller ,
Xinlei Chen
CVPR (CVPR). 2020.
3D-RelNet: Joint Object and Relational Network for 3D Prediction
Nilesh Kulkarni ,
Ishan Misra
,
Shubham Tulsiani ,
Abhinav Gupta
ICCV (ICCV). 2019.
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Priya Goyal ,
Dhruv Mahajan ,
Abhinav Gupta *,
Ishan Misra
*
ICCV (ICCV). 2019.
Binary Image Selection (BISON): Interpretable Evaluation of Visual Grounding
Hexiang Hu ,
Ishan Misra
,
Laurens van der Maaten
ICCV Workshop on Vision and Language (ICCV Workshop on Vision and Language). 2019.
Does Object Recognition Work for Everyone?
Terrance DeVries *,
Ishan Misra
*,
Changhan Wang *,
Laurens van der Maaten
CVPR (CVPR). 2019.
Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing
Angela Jiang ,
Daniel L.-K. Wong ,
Christopher Canel ,
Ishan Misra
,
Michael Kaminsky ,
Michael Kozuch ,
Padmanabhan Pillai ,
David G. Andersen and Gregory Ganger
USENIX Annual Technical Conference (USENIX Annual Technical Conference). 2018.
Learning by Asking Questions
Ishan Misra
,
Ross Girshick ,
Rob Fergus ,
Martial Hebert ,
Abhinav Gupta ,
Laurens van der Maaten
CVPR (CVPR). 2018.
Cut Paste and Learn: Surprisingly Easy Synthesis
for Instance Detection
for Instance Detection
Debidatta Dwibedi ,
Ishan Misra
,
Martial Hebert
ICCV (ICCV). 2017.
From Red Wine to Red Tomato: Composition with Context
Ishan Misra
,
Abhinav Gupta ,
Martial Hebert
CVPR (CVPR). 2017.
Shuffle and Learn: Unsupervised Learning
using Temporal Order Verification
using Temporal Order Verification
Ishan Misra
,
C. Lawrence Zitnick ,
Martial Hebert
ECCV (ECCV). 2016.
Seeing through the Human Reporting Bias:
Visual Classifiers from Noisy
Visual Classifiers from Noisy
Ishan Misra
,
C. Lawrence Zitnick ,
Margaret Mitchell ,
Ross Girshick
CVPR (CVPR). 2016.
Cross-stitch Networks for Multi-Task Learning
Ishan Misra
*,
Abhinav Shrivastava *,
Abhinav Gupta ,
Martial Hebert
CVPR (CVPR). 2016.
Generating Natural Questions About an Image
Nasrin Mostafazadeh ,
Ishan Misra
,
Jacob Devlin ,
Margaret Mitchell ,
Xiaodong He ,
Lucy Vanderwende
ACL (ACL). 2016.
Visual Storytelling
Ting-Hao Huang ,
Francis Ferraro ,
Nasrin Mostafazadeh ,
Ishan Misra
,
Jacob Devlin ,
Aishwarya Agrawal ,
Ross Girshick ,
Xiaodong He ,
Pushmeet Kohli ,
et al.
NAACL (NAACL). 2016.
Watch and Learn: Semi-Supervised Learning of Object Detectors from Video
Ishan Misra
,
Abhinav Shrivastava ,
Martial Hebert
CVPR (CVPR). 2015.
Applying artificial vision models to human scene understanding
Elissa Aminoff ,
M. Toneva ,
Abhinav Shrivastava ,
Xinlei Chen ,
Ishan Misra
,
et al.
Journal of Frontiers in Computational Neuroscience (Journal of Frontiers in Computational Neuroscience). 2015.
Data-driven Exemplar Model Selection
Ishan Misra
,
Abhinav Shrivastava ,
Martial Hebert
WACV (WACV). 2014.