Huiyu Wang


I am a research scientist at FAIR, Meta. Before that I received my Ph.D. in Computer Science at Johns Hopkins University, advised by Bloomberg Distinguished Professor Alan Yuille. I obtained M.S. in Electrical Engineering from University of California, Los Angeles and B.S. in Information Engineering from Shanghai Jiao Tong University. I also spent two years as a student researcher at Google. I had wonderful summers at Allen Institute for Artificial Intelligence and TuSimple.

My research interest is computer vision with focuses on understanding images and videos.

News

  • 3 papers to be presented at ECCV 2024.
  • Ego-Exo4D and ROSA accepted to CVPR 2024, both as oral.
  • Ego4D Goal-Step and HT-Step presented at NeurIPS 2023 in New Orleans.
  • Ego-Only, DiffMAE, and SMAUG presented remotely at ICCV 2023, Paris, France.
  • DMAE presented at CVPR 2023.
  • 3 / 3 submissions accepted to ECCV 2022.
  • 3 / 3 submissions accepted to CVPR 2022.
  • iBOT for masked image modeling with an online tokenizer is accepted to ICLR 2022.
  • DeepLab2 has been released, with MaX-DeepLab and Axial-DeepLab officially re-implemented in TensorFlow2.
  • MaX-DeepLab, accepted to CVPR 2021, proposes Mask Xformers for end-to-end panoptic segmentation.
  • Axial-DeepLab, the first architecture with global attention in all layers, is accepted to ECCV 2020.

Selected Publications

Ego-Only: Egocentric Action Detection without Exocentric Transferring
Huiyu Wang, Mitesh Kumar Singh, Lorenzo Torresani
In International Conference on Computer Vision (ICCV), 2023
arXiv | poster | slides | video

Diffusion Models as Masked Autoencoders
Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer
In International Conference on Computer Vision (ICCV), 2023
arXiv | project

k-means Mask Transformer
Qihang Yu, Huiyu Wang, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
In European Conference on Computer Vision (ECCV), 2022
arXiv | code | Google AI blog

iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong
In International Conference on Learning Representations (ICLR), 2022
arXiv | code

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
In Conference on Computer Vision and Pattern Recognition (CVPR), 2021
arXiv | code | poster | slides | video | Google AI blog

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
In European Conference on Computer Vision (ECCV), 2020 (Spotlight)
arXiv | official code | PyTorch code | slides | video | Google AI blog

ELASTIC: Improving CNNs with Dynamic Scaling Policies
Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan Yuille, Mohammad Rastegari
In Conference on Computer Vision and Pattern Recognition (CVPR), 2019 (Oral)
arXiv | code | poster | video

Full List of Publications

Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Fu-Jen Chu, Kris Kitani, Gedas Bertasius, Xitong Yang
In European Conference on Computer Vision (ECCV), 2024 (Oral)

Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma, Kai Li, Zhongshi Jiang, Moustafa Meshry, Qihao Liu, Huiyu Wang, Christian Häne, Alan Yuille
In European Conference on Computer Vision (ECCV), 2024

4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
Feng Cheng, Mi Luo, Huiyu Wang, Alex Dimakis, Lorenzo Torresani, Gedas Bertasius, Kristen Grauman
In European Conference on Computer Vision (ECCV), 2024

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray
In Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Oral)
arXiv | data | code | video | Meta AI blog

Learning to Segment Referred Objects from Narrated Egocentric Videos
Yuhan Shen, Huiyu Wang, Xitong Yang, Matt Feiszli, Ehsan Elhamifar, Lorenzo Torresani, Effrosyni Mavroudi
In Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Oral)
paper

Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities
Yale Song, Gene Byrne, Tushar Nagarajan, Huiyu Wang, Miguel Martin, Lorenzo Torresani
In Conference on Neural Information Processing Systems (NeurIPS), 2023 (Spotlight)
paper | data | visualization

HT-Step: Aligning Instructional Articles with How-To Videos
Triantafyllos Afouras, Effrosyni Mavroudi, Tushar Nagarajan, Huiyu Wang, Lorenzo Torresani
In Conference on Neural Information Processing Systems (NeurIPS), 2023
paper | data | project

Ego-Only: Egocentric Action Detection without Exocentric Transferring
Huiyu Wang, Mitesh Kumar Singh, Lorenzo Torresani
In International Conference on Computer Vision (ICCV), 2023
arXiv | poster | slides | video

Diffusion Models as Masked Autoencoders
Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer
In International Conference on Computer Vision (ICCV), 2023
arXiv | project

SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin, Chen Wei, Huiyu Wang, Alan Yuille, Cihang Xie
In International Conference on Computer Vision (ICCV), 2023
arXiv

Masked Autoencoders Enable Efficient Knowledge Distillers
Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie
In Conference on Computer Vision and Pattern Recognition (CVPR), 2023
arXiv | code

k-means Mask Transformer
Qihang Yu, Huiyu Wang, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
In European Conference on Computer Vision (ECCV), 2022
arXiv | code | Google AI blog

CP2: Copy-Paste Contrastive Pretraining for Semantic Segmentation
Feng Wang, Huiyu Wang, Chen Wei, Alan Yuille, Wei Shen
In European Conference on Computer Vision (ECCV), 2022
arXiv | code

In Defense of Image Pre-Training for Spatiotemporal Recognition
Xianhang Li, Huiyu Wang, Chen Wei, Jieru Mei, Alan Yuille, Yuyin Zhou, Cihang Xie
In European Conference on Computer Vision (ECCV), 2022
arXiv | code

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation
Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
In Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)
arXiv | video | Google AI Blog

TubeFormer-DeepLab: Video Mask Transformer
Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen
In Conference on Computer Vision and Pattern Recognition (CVPR), 2022
arXiv | visualization

A Simple Data Mixing Prior for Improving Self-Supervised Learning
Sucheng Ren, Huiyu Wang, Zhengqi Gao, Shengfeng He, Alan Yuille, Yuyin Zhou, Cihang Xie
In Conference on Computer Vision and Pattern Recognition (CVPR), 2022
arXiv | code

On Modeling Long-Range Dependencies for Visual Perception
Huiyu Wang
Ph.D. thesis, Johns Hopkins University, 2022
dissertation

iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong
In International Conference on Learning Representations (ICLR), 2022
arXiv | code

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Huaijin Pi, Huiyu Wang, Yingwei Li, Zizhang Li, Alan Yuille
In British Machine Vision Conference (BMVC), 2021
arXiv | code

DeepLab2: A TensorFlow Library for Deep Labeling
Mark Weber*, Huiyu Wang*, Siyuan Qiao*, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen
In arXiv preprint, 2021
arXiv | code

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Huiyu Wang, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
In Conference on Computer Vision and Pattern Recognition (CVPR), 2021
arXiv | code | poster | slides | video | Google AI blog

SpecTr: Spectral Transformer for Hyperspectral Pathology Image Segmentation
Boxiang Yun, Yan Wang, Jieneng Chen, Huiyu Wang, Wei Shen, Qingli Li
In arXiv preprint, 2021
arXiv | code

CO2: Consistent Contrast for Unsupervised Visual Representation Learning
Chen Wei, Huiyu Wang, Wei Shen, Alan Yuille
In International Conference on Learning Representations (ICLR), 2021
arXiv

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
In European Conference on Computer Vision (ECCV), 2020 (Spotlight)
arXiv | official code | PyTorch code | slides | video | Google AI blog

Scaling Wide Residual Networks for Panoptic Segmentation
Liang-Chieh Chen, Huiyu Wang, Siyuan Qiao
In arXiv preprint, 2020
arXiv | code

Rethinking Normalization and Elimination Singularity in Neural Networks
Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille
In arXiv preprint, 2019
arXiv | code

Combining Compositional Models and Deep Networks For Robust Object Classification under Occlusion
Adam Kortylewski, Qing Liu, Huiyu Wang, Zhishuai Zhang, Alan Yuille
In Winter Conference on Applications of Computer Vision (WACV), 2020 (Spotlight)
arXiv

Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval
Qing Liu, Lingxi Xie, Huiyu Wang, Alan Yuille
In International Conference on Computer Vision (ICCV), 2019
arXiv | code

Weight Standardization
Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille
In arXiv preprint, 2019
arXiv | code

ELASTIC: Improving CNNs with Dynamic Scaling Policies
Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan Yuille, Mohammad Rastegari
In Conference on Computer Vision and Pattern Recognition (CVPR), 2019 (Oral)
arXiv | code | poster | video

Semantic Mapping for Safe and Comfortable Navigation of a Brain-Controlled Wheelchair
Zhixuan Wei, Weidong Chen, Jingchuan Wang, Huiyu Wang, Kang Li
In International Conference on Intelligent Robotics and Applications (ICIRA), 2013
paper

Novelty

Novelty (Noah) is my cat. He is a core contributor featured in the Ego How-To research by Meta:

Noah

Here are some images reconstructed with DiffMAE (original, masked, generated):

Noah Noah Noah Noah Noah Noah Noah