Kriegman Research Group
Home / People / Research Projects / Publications / Software / Links


Visual Tracking Using Learned Subspaces

Abstract
This paper presents a simple but robust visual tracking algorithm based on representing the appearances of objects using affine warps of learned linear subspaces of the image space. The tracker adaptively updates this subspace while tracking by finding a linear subspace that best approximates the observations made in the previous frames. Instead of the traditional $L^2$-reconstruction error norm which leads to subspace estimation using PCA or SVD, we argue that a variant of it, the uniform $L^2$-reconstruction error norm, is the right one for tracking. Under this framework, we provide a simple and a computationally inexpensive algorithm for finding a subspace whose uniform $L^2$-reconstruction error norm for a given collection of data samples is below some threshold, and a simple tracking algorithm is an immediate consequence. We show experimental results on a variety of image sequences of people and man-made objects moving under challenging imaging conditions, which include drastic illumination variation, partial occlusion and extreme pose variation.

Some Tracking Results

In the sequences below, we put the tracker to test using several different video sequences, with varying degree of difficulty. Some sequences are short (less than 100 frames) and some are longer (more than 500 frames). In all sequences, there are non-trivial image variations. For us, important types of image variations are 1) illumination variation, 2)pose variation, 3)partial occlusion and to a lesser degree, 4)expression variation. Cluttered background is also an important test case for demonstrating the robustness of the tracker. All the sequences below were originally obtained as colored sequences. Since our tracker only uses grey-scale intensity values, the tracking results are shown in grey-scale video sequences. The tracking videos can be viewed by clicking the images.

Our implementation follows exactly the algorithm detailed in the paper. The tracker is completely image-based. In particular, there is no pixel-level operations such as filtering and no non-linear optimizations, e.g. robust matching. There are five parameters that need to be tuned differently for each sequence. Since our implementation does not include any dynamics modelling, at each frame, we simply sample a number of windows (around 250) of various sizes and orientation on the image. The sampling is based on a Gaussian pdf defined on a five-dimensional state space (the location of the center of the window, its width and height and its angular orientation). The mean of the Gaussian is taken to be the window of the target at previous frame and each component of the state space is taken to be independent, i.e. the covariant matrix is diagonal.

Warm-ups

In the two sequences below, we show that the tracker is, at least, capable of tracking objects in simple motions under friendly environments. The white foam-cup sequence is perhaps the simplest. While the fist sequence is generally not considered to be difficult, it does contain some non-trivial image variation.

Tracking Balls

In the four short sequences below, we track balls without any distinctive feature points on them. Tracking white ball is somewhat easier because its surface texture is quite different from the background. Tracking black ball is, on the other hand, more challenging. The hair offers a background texture that is quite similar to the black ball, and in the last video, the two balls cross each other in mid-air with the black ball being temporarily occluded by the white ball for about four frames.



In the next two sequence, we show that the tracker performs reasonable well under cluttered background. Again, the white ball is easier to track, thanks to its distinctive texture. However, because of the newspaper, the background now is significantly more cluttered than the previous four sequences.



Tracking Garfield.

In this sequence (approximately 650 frames), Garfield's head turns and twists in various different ways. At several instances, the pose variation introduced differs quite significantly from the frontal view of the face that started the video sequence. There is also some illumination effect, although small compared with pose variation.


Tracking Human Faces

Illumination Variation

These two sequences demonstrate that the tracker is stable under both illumination and pose variations. The tracking results for both sequences are quite precise.



Pose Variation, Partial Occlusion and More

In the next three sequences, there are several challenges for the tracker to overcome. The first sequence shows that the tracker is stable when both pose variation and partial occlusion develop simultaneously. The second sequence contains a wide range of poses for a human face. The third sequences contains several different challenges, including pose variation, expression variation, partial occlusion, etc.



All in One

In this long sequence (about 1800 frames), the tracker faces its toughest assignment yet. There are many challenging conditions for the tracker to overcome. These include, camera jitter, partial occlusion, illumination effect, pose variation, expression changes, and so on.



The data for this sequence is available for download here.


 
Reference
J. Ho, K.-C. Lee, M.-H. Yang and D. Kriegman, Visual Tracking Using Learned Subspaces, CVPR 2004, to appear
Last updated : April 12 2004
Research support