A Haar-like feature is a digital image feature used in object recognition. It is named after the Haar wavelet, though its implementation for object detection simplifies the concept for efficiency. These features were first proposed by Paul Viola and Michael Jones in 2001 within their seminal paper, "Rapid Object Detection using a Boosted Cascade of Simple Features," which introduced the Viola-Jones object detection framework, most famously applied to real-time face detection.
Principle of Operation
A Haar-like feature operates by calculating the difference between the sum of pixel intensities within adjacent rectangular regions in an image. The idea is that this difference can highlight specific patterns or contrasts in an image, such as edges, lines, or other simple structures, which often correspond to specific parts of an object.
For example, a common Haar-like feature used for face detection might consist of two adjacent rectangles: one over the eye region (darker) and one over the cheek region (lighter). The difference in the sum of pixel intensities between these two regions would be significant, indicating the presence of an eye.
Types of Features
Common types of Haar-like features include:
- Two-rectangle features: These detect edges. They consist of two adjacent rectangles of the same size, either horizontally or vertically oriented. The feature value is the difference between the sum of pixels in the two rectangles.
- Three-rectangle features: These detect lines. They consist of three adjacent rectangles. The feature value is the sum of pixels in the outer two rectangles subtracted from twice the sum of pixels in the middle rectangle.
- Four-rectangle features: These detect diagonal lines or more complex patterns. They consist of four adjacent rectangles arranged in a 2x2 grid. The feature value is the difference between the sum of pixels in the diagonal pairs of rectangles.
These basic types can be applied at various locations and scales across an image, leading to a very large set of potential features.
Efficient Computation using Integral Images
One of the key innovations that made Haar-like features practical was their rapid computation using integral images, also known as summed-area tables. An integral image is an intermediate representation of an image where the value at each pixel (x, y) is the sum of all pixel values above and to the left of (x, y), inclusive, in the original image.
Using an integral image, the sum of pixel intensities within any rectangular region can be calculated with just four array lookups, regardless of the rectangle's size. This dramatically reduces the computational cost of evaluating Haar-like features, allowing for thousands of features to be computed very quickly.
Applications
The primary and most well-known application of Haar-like features is in the Viola-Jones object detection algorithm, particularly for real-time face detection. Beyond faces, they have been successfully applied to:
- Pedestrian detection
- Car detection
- Object tracking
- General object recognition tasks where distinct local intensity changes characterize the target object.
Advantages and Limitations
Advantages:
- Computational Efficiency: Extremely fast feature calculation due to integral images.
- Effectiveness: Good at capturing local contrast information relevant for distinguishing objects from their background.
- Simplicity: Conceptually straightforward to understand and implement.
Limitations:
- Sensitivity to Rotation and Scale: Basic Haar-like features are sensitive to rotation of objects. While the Viola-Jones framework addresses scaling by running the detector at multiple scales, rotation invariance is not inherently built into the features themselves.
- Feature Redundancy: A large number of features are often required, many of which may be redundant or irrelevant, necessitating feature selection techniques (e.g., AdaBoost).
- Limited Discriminative Power: Each individual feature is quite simple and has low discriminative power on its own; their strength comes from being combined in a cascade.