Kriegman Research Group
Home / People / Research Projects / Publications / Software / Links


Tracking Humans using Shape and Appearance

Abstract
Tracking a moving person is challenging because a person's appearance in images changes significantly due to articulation, viewpoint changes, and lighting variation across a scene. And different people appear differently due to numerous factors such as body shape, clothing, skin color, and hair. In this paper, we introduce a multi-cue tracking technique that uses prior information about the 2-D image shape of people in general along with an appearance model that is learned on-line for a specific individual. Assuming a static camera, the background is modeled and updated on-line. Rather than performing thresholding and blob detection during tracking, a foreground probability map (FPM) is computed which indicates the likelihood that a pixel is not the projection of the background. Off-line, a shape model of walking people is estimated from the FPMs computed from training sequences. During tracking, this generic prior model of human shape is used for person detection and to initialize a tracking process. As this prior model is very generic, a model of an individual's appearance is learned on-line during the tracking. As the person is tracked through a sequence using both shape and appearance, the appearance model is refined and multi-cue tracking becomes more robust.

Overview
The goal of this work is to locate, identify and monitor people in the video. We need to keep track of position, pose, identity, activity of each person in view. The main difficulty in this work is the lack of cues to track the individuals. The shape or silhouette is irregular and changing due to the articulation and the camera projection. The appearance (pixel values) is even more difficult to model because of individual differences and illumination changes.
This approach tries to ameliorate these problems with using a priori shape model and the appearance model which is learned on-line. Background modelling is used to distinguish foreground pixels which belongs to the moving objects. The foreground-ness image is used to detect an entry of a person with the shape model. Once a person is detected, a tracker is initialized and starts tracking and learning the appearance. When the person is not seen anymore, the tracker is destroyed automatically.
 

 
Results
The shape model
 

The appearance model (learned from the one-person sequence)
     

Tracking result: one-person sequence [ avi ]
     
Note the appearance is learned independently for each shape. In the window at the upper left corner of each frame, the left subwindow shows the estimated pose, the center subwindow shows the appearance model learned for the pose, and the right subwindow shows the weight of the Stable model in the appearance model.

Tracking result: two-people sequence [ avi ]
     
 
Reference
J. Lim, D. Kriegman, Tracking Humans Using Prior and Learned Representations of Shape and Appearance, FGR 2004, to appear
Last updated : April 12 2004
Research support