Quantitative comparison of metrics for human pose estimation

Date

2013-08

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Human Pose Estimation is an important problem in computer vision that has received considerable attention in recent years. The multitude of application's related to Visual Surveillance, Human Computer Interaction and Activity Recognition often require the ability to estimate human pose. The ability to quantify the orientation of humans observed in images is a rather challenging problem attributed to variation in illumination, occlusions, and variations due to articulation of the human body. Many approaches to address this problem have been investigated, most of which rely on the use of 3D models and estimate the pose based on a model-fitting process. Such approaches are limited due to the assumptions of static and easily removable background, in addition to limited occlusion and variations in the articulations that can be incorporated in the 3D training dataset. Alternatively, approaches based on local image and shape features have also been proposed. In this thesis we consider pose estimation based on image features and provide a comparative evaluation to assess the utility of three common feature descriptors and three common classifiers. Image feature-based pose estimation involves a multi-step process including Feature Extraction, Feature Selection, and Classification. We investigate Histogram-based, GIST-based and SIFT-based feature extraction and representation algorithms. In the Feature Selection stage we reduce the dimensionality of the features using Information Gain and Principal Component Analysis. For classification we first discretize the pose estimation into 4 coarse orientations denoted by right-, left-, front- and back-facing. Classification is studied as both a hierarchical (two stage) solution and a direct estimation. In the hierarchical solution, we group front- and back-facing into a single class and right- and left-facing into another class. At the first level, a classifier is trained to differentiate between these two classes. A second classifier is trained to then differentiate the classes within each group from the first level. A direct estimation simply trains a classifier to differentiate the 4 classes. SVM, Decision Trees and Random Forest classifiers are used for training and prediction, and results are presented on the publicly available PASCAL VOC 2010 dataset.

Description

Keywords

Computer vision, Pose estimation, Human detection, SIFT, GIST, Histogram, SVM, Decision trees, Random forests, PASCAL VOC

Citation