Xingyi Zhou

I am a second year (2017-) Computer Science graduate student at The University of Texas at Austin, supervised by Prof. Philipp Krähenbühl. I obtained my bachelor degree from School of Computer Science at Fudan University, advised by Prof. Wei Zhang and Prof. Xiangyang Xue. I have spent 6 months as a research intern at Microsoft Research Asia, working with Dr. Yichen Wei.

Here is my CV.

The profile photo is taken by my lovely girlfriend Jiarui Gao.



My research focuses on computer vision and computer graphics. Specifically, I have been working on various projects on object keypoints estimation. Here is a seminar slide (March 2018) showing the connections/motivations of my works.

StarMap for Category-Agnostic Keypoint and Viewpoint Estimation
Xingyi Zhou, Arjun Karpur, Linjie Luo, Qixing Huang
European Conference on Computer Vision (ECCV), 2018
bibtex  /  code /  model /  supplementary /  poster

We propose a category-agnostic keypoint representation encoded with their 3D locations in the canonical object views. The representation consists of a single channel, multi-peak heatmap (StarMap) for all the keypoints and their corresponding features as 3D locations in the canonical object view (CanViewFeature) defined for each category. Not only is our representation flexible, but we also demonstrate competitive performance in keypoint detection and localization compared to category-specific state-of-the-art methods. Additionally, we show that when augmented with an additional depth channel (DepthMap) to lift the 2D keypoints to 3D, our representation can achieve state-of-the-art results in viewpoint estimation.

Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
Xingyi Zhou, Arjun Karpur, Chuang Gan, Linjie Luo, Qixing Huang
European Conference on Computer Vision (ECCV), 2018
bibtex  /  code /  model /  poster

We introduce an unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan/image. Our key idea is to utilize the fact that predictions from different views of the same or similar objects should be consistent with each other. Such view consistency provides effective regularization for keypoint prediction on unlabeled instances. In addition, we introduce a geometric alignment term to regularize predictions in the target domain. The resulting loss function can be effectively optimized via alternating minimization.

Towards 3D Human Pose Estimation in the Wild: A weakly-supervised Approach
Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, Yichen Wei
International Conference on Computer Vision (ICCV), 2017
bibtex  /  code (torch) /  code (PyTorch) /  model /  supplementary /  poster

We propose a weakly-supervised transfer learning method that learns an end-to-end network using training data with mixed 2D and 3D labels. The network augments a state-of-the-art 2D pose estimation network with a 3D depth regression network. The 3D pose labels in controlled environments are transferred to images in the wild that only possess 2D annotations. Importantly, we introduce a 3D geometric constraint to regularize the prediction 3D poses, which is effective on images that only have 2D annotations.

Deep Kinematic Pose Regression
Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, Yichen Wei
ECCV Workshop on Geometry Meets Deep Learning, 2016
bibtex  /  code /  poster

We propose to directly embed a kinematic object model into the deep neutral network learning for general articulated object pose estimation. The kinematic function is defined on the appropriately parameterized object motion variables. We show convincing experiment results on a toy example, and we achieve state-of-the-art result on Human3.6M dataset for the 3D human pose estimation problem.

Model-based Deep Hand Pose Estimation
Xingyi Zhou, Qingfu Wan, Wei Zhang, Xiangyang Xue, Yichen Wei
International Joint Conference on Artificial Intelligence (IJCAI), 2016
bibtex  /  code /  slides /  poster

We propose a model based deep learning approach that adopts a forward kinematics based layer to ensure the geometric validity of estimated poses. For the first time, we show that embedding such a non-linear generative process in deep learning is feasible for hand pose estimation.