Abstract
Tracking a moving person is challenging because a person's appearance
in images changes significantly due to articulation, viewpoint
changes, and lighting variation across a scene. And different people
appear differently due to numerous factors such as body shape,
clothing, skin color, and hair. In this paper, we introduce a
multi-cue tracking technique that uses prior information about the 2-D
image shape of people in general along with an appearance model that
is learned on-line for a specific individual. Assuming a static
camera, the background is modeled and updated on-line. Rather than
performing thresholding and blob detection during tracking, a
foreground probability map (FPM) is computed which indicates the
likelihood that a pixel is not the projection of the background.
Off-line, a shape model of walking people is estimated from the FPMs
computed from training sequences. During tracking, this generic prior
model of human shape is used for person detection and to initialize
a tracking process. As this prior model is very generic, a model of an
individual's appearance is learned on-line during the tracking. As
the person is tracked through a sequence using both shape and
appearance, the appearance model is refined and multi-cue tracking
becomes more robust.
Overview
The goal of this work is to locate, identify and monitor people in
the video. We need to keep track of position, pose, identity, activity
of each person in view. The main difficulty in this work is the lack
of cues to track the individuals.
The shape or silhouette is irregular and changing due to the articulation
and the camera projection. The appearance (pixel values) is even more
difficult to model because of individual differences and illumination
changes.
This approach tries to ameliorate these problems with using a priori shape
model and the appearance model which is learned on-line.
Background modelling is used to distinguish foreground pixels which belongs
to the moving objects. The foreground-ness image is used to detect an
entry of a person with the shape model. Once a person is detected,
a tracker is initialized and starts tracking and learning the appearance.
When the person is not seen anymore, the tracker is destroyed automatically.
Results
The shape model
The appearance model (learned from the one-person sequence)
Tracking result: one-person sequence [ avi ]
Note the appearance is learned independently for each shape.
In the window at the upper left corner of each frame, the left subwindow
shows the estimated pose, the center subwindow shows the appearance model
learned for the pose, and the right subwindow shows the weight of
the Stable model in the appearance model.