Explainable Transfer-Learning and Knowledge Distillation for Fast and Accurate Head-Pose Estimation
Abstract
Head-pose estimation from facial images is an important research topic in
computer-vision. It has many applications in detecting the focus of attention,
monitoring driver behavior, and human-computer interaction. As with other
computer-vision topics, recent research on head-pose estimation has been focused
on using deep convolutional neural networks (CNNs). Although deeper
networks improve prediction accuracy, they suffer from dependency on expensive
hardware such as GPUs to perform real-time inference. As a result,
CNN model compression becomes an important concept. In this work, we
propose a novel CNN compression method by combing weight pruning and
knowledge distillation. Additionally, we improve the state-of-the-art head-pose
estimation model with image-augmentation and transfer-learning. We
apply our compression method to a baseline head-pose estimation model and
validate the performance of the compression by creating validation scenarios.
Additionally, we test our compression method on different CNN architectures
and classification tasks to show the effectiveness of our compression method.