Release Notes
Revision History:
Quadrature Disambiguation Feature Detector
Most code by Tyler C. Folsom. Some version 4.0 code by
Jim Albers.
Version 1.x
C language code for doctoral dissertation at
University of Washington, June 1994
DOS version with no graphics displays. Output
is text only.
Works on monochrome single
images.
Version 2.0
C++ version for UW short course in C++, Autumn 1998
Latest update: December 7, 1998
Windows version with
graphics overlays to show where features were found.
Uses Microsoft Vision SDK version 1.0
Version 2.1
Latest update: Feb 27, 2000
Added the capability of
generating test images to verify performance on known cases.
Expanded the output file.
Research plan was only partially completed.
Uses Microsoft Vision SDK version 1.2
Version 3.0
July 9, 2003
Faster routines. Can handle color images. First cut at corner
detection. Uses lateral inhibition / facilitation to build extended edges. See
below for details.
Version 3.1 (Sept 2003)
Objectives: Link the features to produce a
segmentation of the image. Output should be Hermite curves that outline
objects. Be able to go from an image to a cartoon.
Expand to handle a color stereo video image
stream.
Instead of a fixed sampling grid, adapt the
sampling grid based on features found in the previous frame.
Version 4.0 (August 2005)
Objective: Real-time binocular stereo on large images as
the primary obstacle detection for a robot vehicle travelling offroad with no
other traffic at up to 100 km/h.
Not in public domain. Select individuals will be given alpha source
code for non-commercial purposes.
No MFC, no GUI, translated from C++ to
C. Should run on any
processor with enough power. Inputs
a .PGM image pair and outputs a DirectX compatible line strip of vertices. As
debug, also outputs a color coded-distance map as a PPM image and intermediate
text files. Input image pair is monochrome. The only directions detected are horizontal
and vertical. No bars; all features are
edges or corners.
Overview
For an introduction, see the Software Design
Document for Version 2.0 and http://home.earthlink.net/~tylerfolsom/
The version 2 and 3 code is written using
Microsoft Foundation Classes (MFC) document / view architecture. There are
several wizard generated files such as { MainFrm.cpp,
ChildFrm.cpp, StdAfx.cpp, ImageFeatures.cpp, and their header files} which do
not need to be disturbed and are of little interest. These MFC classes are
derived from window classes. The view class is meant to handle the graphical
user interface (GUI). The document class is meant to handle the real work. The
header files carry comments on usage for class methods and variables. The files
are organized as follows:
ImageFeatureView: Handles the GUI
Options: Gets items requested from the
Options menu.
ImageFeaturesDoc: Handler for trivial tasks
requested from the GUI.
ProcessFeatures: The main routine. It sets up
the grid that determines the locations at which the image is sampled
IFFilter: The filter class. It takes as input
a patch of image and correlates it with cortical filters.
IFLocation: Holds the result of the
correlation at each location. Finds the orientations.
IFFeature: The feature class. It takes as
inputs the results from applying cortical filters to a patch of image. It does
1D processing based on knowing the orientation. As outputs, it produces the
interpretation of what features are at that location.
TestImage: Routines to generate synthetic
images for testing. (incomplete) These are on the
Process | Test cases menu.
Version 4 Changes from version 3: ( October, 2004 – August 2005)
Does binocular stereo. Translated from C++ to C and intended to run as an
embedded application. No MFC, no GUI,
nothing device specific other than camera interface. If not connected to a camera, input is a
monochrome image pair in PGM. Output is
a 3D line strip of vertices in meters.
This may be extended to a triangle strip or fan describing a surface.
Has been tested using a Silicon Imaging
SI-1280F camera. This is a monochrome
camera using a 1280 x 1022 format and capable of producing 50 frames per second
continuously. It can operate at 3000
frames per second on a 100 x 100 pixel subimage. Shutter can be either rolling or triggered. The camera can produce 12-bit images, but we
have used 8-bit. A color camera is
available, but we have used monochrome for improved resolution.
We have used mirrors to
produce a stereo pair on a single camera.
See Gluckman and. Nayar. The mirrors were not aligned perfectly,
forcing us to crop each image to 1265 x 400.
We used up-down stereo. It may be
a better idea to put the camera on its side and generate two 1022 x 640
images. The software works for either
right-left or up-down stereo.
This version treats all features as edges;
there are no bars. It does not find the
angle of the feature. In version 3,
angle finding took about a third of the processing time. It finds only horizontal and vertical
edges. Thus only four filters are
needed. If a feature has significant
components in both orientations, it is classified as a corner.
Subsampling on a large filter has been
properly tested.
Version 3.0 Changes from version 2.1:
(May 16 -
Changed the orientation of
filters. Previously, both even and
odd filters shared a horizontal orientation. Now the common orientation is
vertical. This is done to facilitate stereo vision.
Changed from using four odd
filter orientations to two. Both
versions use three orientations of even filters. The even filter has a
"bump" window applied to the equation -502.48 x2 + 7.8287.
The old odd filter used the equation -925.81 x3 + 97.7913 x. The new
odd filter uses 72.0232 x. The theory of steerable
filters says that you can interpolate the angle of any filter from a small
number of sampled orientations. In the case of polynomials, the number of
samples needed is one more than the degree of the polynomial. See the
dissertation for details. The equations used in the code have a multiplier of
15.66 times the numbers in the dissertation. Code changes also involved new
look-up tables to determine the edge or bar position and strength based on the
response of the odd/even filters. This data was generated on a spreadsheet. New
bar width data was also produced, but it needs more work. However, bar width
code was changed to get the bar width from the edge position when possible.
Updated the rules used to determine whether a
feature is an edge or a bar. The basic idea is to see where the large and small
filters predict the feature will be. There are sets of predictions for edges,
dark bars, and light bars. Only one of the three pairs of predictions can be
consistent. That one is chosen. A supplementary rule is based on the phase from
the even and odd filters: If the absolute value of the phase is less than 0.42,
then it is a dark bar; if it is greater than 2.7 radians, then it is a light
bar. Phases in those ranges should produce no response for the other two
filters.
Wrote a new routine for
finding the angle of maximum response. The previous code used a "solve" routine to find the angle
that would give the maximum filter response (based on three even and four odd).
It tried multiple starting points to find where the derivative of the total
response is zero. This was replaced with a "solve_max" routine based
on three even and two odd filters. It should use many fewer iterations than the
previous version. It starts at four approximate solutions and finds the
maximum, using the value of the response as well as first and second
derivatives.
Added the ability to handle
color images. If more than one
band is present, the analysis is done on the band that produces the strongest
response. This should work best on an image with a luminance band and two
chrominance bands. Defining DO_COLORS in stdAfx.h determines whether images are
processed in gray or in color.
Added the ability to
subsample when doing a correlation using large filters. This has not been adequately tested.
The above changes were archived as version
3.0.0. Changes below are in version 3.0.1
The main reason for most of the above changes
was to make the code run faster. To find out how fast it runs, profiling code
was added. This is in files profile.h and profile.cpp. These can be deleted
from the project with no ill effect, or equivalently, undefine PROFILE_ME in
profile.h. Profiling causes information to be written to a profile.txt file.
The software is taken from Greg Hjelstrom and Byon Garrabrant, "Real-Time
Hierarchical Profiling" in Game Programming Gems 3, Charles River
Media, 2002.
Also added is a Canny
edge detector. This is in files canny.cpp, canny.h, cannyDlg.cpp and
cannyDlg.h. These are not central to the software and may be omitted. Their
purpose it to compare the time required by the Quadrature Disambiguation method
against a standard method. The Canny code was written by J.R. Parker in Algorithms
for Image Processing and Computer Vision. It was modified by Travis Udd to
work with the Microsoft Vision SDK in November, 2000, then
further modified by Tyler Folsom in 2003.
Profiling shows that the time required for
processing depends strongly on the size of the smallest filter used. The
following times are for the CProcessFeatures::Process method, which is the
heart of Quadrature disambiguation. Times should be in ms, but the timing has
not been independently calibrated. The times are for a release version of the
code processing a gray level elephant image, with no subsampling.
|
Filter size |
20 |
12 |
8 |
5 |
|
Time |
19.2 |
33.4 |
61.7 |
149.1 |
The time to do Canny
edge detection on this image is 41.2. Thus we have achieved speed comparable to
the Canny edge detector. It remains to be shown that
the results achieved are as good or better.
The time to process features is mostly split
into two subtasks: Filter, which performs the correlations at selected areas of
the image, and Interpret, which finds the angle of maximum response, steers to
that angle, and determines feature position, type, and strength. In addition,
the code spends time displaying graphics of the found edges and writing the
answers to a file. The following table shows the time spent on these tasks
relative to the time for processing the features.
|
Filter size |
20 |
12 |
8 |
5 |
|
Filter |
65% |
59% |
54% |
51% |
|
Interpret |
33% |
38% |
40% |
39% |
|
Display |
36% |
23% |
13% |
7% |
|
Write file |
47% |
48% |
47% |
46% |
Compared to the version 2.1 code, the time
required to process features is 45% when filter size is 20 and 81% when filter
size is 8.
Fixed inconsistencies in the code about what
is light or dark. Made the orientation and coordinate system
consistent. The X-axis points right; Y-axis points down; positive
rotation is clockwise. An orientation of 0 degrees corresponds to the vector (0,1) with dark in the region where x is negative and light
for positive x.
Added lateral antagonism. During the initial pass, features that fall below
threshold are retained. Their effective strength is increased or decreased
based on neighboring features.
Added corner detection. After finding the main feature, steer to 90 degrees
from it and look for another feature. If there is any, it is a 2D interest
point (corner, X, T, high curvature, blob, etc.) If not, it is an edge or bar.
[To do: this needs more work.]
Removed steering to +/- 45
degrees. These responses could be
used to distinguish certain classes of interest points, but this check has
never been implemented.
Changed "edges
only" behavior. Previously,
the code checked for edges and bars. If the "edges only" view was
selected, display of bars was suppressed. Now, at all locations only edges are
sought. [To do: in this case we don't need to look at large filters and can
save time by omitting large filter correlations.]
Implemented test images for corners.
Corrected the answers generated for bar
testing. Verified that noise-free artificial images of edges
and bars are usually good to subpixel resolution.
Known problems (version 3):
Images of thin bars are often misidentified
as edges with a large position error.
Bar width is unreliable and almost always
estimated as too wide.
Corner detection sometimes winds up putting
the perpendicular edge at the wrong end of the receptive field.
Graphics for drawing bars is sometimes
incorrect.
Class structure (version 3)
The program executes the following
transformations:
Image ->
big image region ->
filtered -> feature
->
small image region ->
filtered
feature list -> Image of features
The classes involved represent images and
their subregions, sampling locations, filters, detected features, and graphical
outlines of features.
In versions 1 and 2, there are three odd
filters at 60 degree orientations and four even filters at 45 degree
orientations.
The even and odd filters share a horizontal
orientation.
In version 3, this has been changed so that
even and odd filters share a vertical orientation. This is meant to make it
easier to detect stereo disparity. Version 3 uses three odd filters at 60
degree orientations and two even filters at 90 degree orientations
Version 4 uses only one size filter
since it does not find bars. There are a
total of four
filters: even and odd at horizontal and vertical orientations. It is in C, so there are no classes.
References
http://home.earthlink.net/~tylerfolsom/
T.C. Folsom, Neural
Networks Modeling Cortical Cells for Machine Vision, Ph.D. Thesis,
University of Washington, Seattle, Washington, 1994.
T.C. Folsom and R.B
Pinter, "Primitive features from steering, quadrature and scale," IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11,
November 1998, pp. 1161-1173
T.C. Folsom “Sparse
Sampling for Robot Vision”
IASTED Conference on Robotics and Applications,
Freeman, W. T., and E. H. Adelson "The Design and Use of Steerable Filters", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, pp. 891-906, 1991.
Gluckman,
Joshua and Sree K. Nayar, Rectified Catadioptic Stereo Sensors, CVPR 2000.
Hjelstrom, Greg and Byon Garrabrant, "Real-Time
Hierarchical Profiling" in Game Programming Gems 3,
Parker, J.R. Algorithms for Image Processing and Computer Vision
Vision Technology Group, Microsoft Research , The Microsoft Vision SDK, Version 1.0, March 1998, http://www.research.microsoft.com/research/vision/