Release Notes

 

Revision History:

Quadrature Disambiguation Feature Detector

Most code by Tyler C. Folsom.  Some version 4.0 code by Jim Albers.

Version 1.x

C language code for doctoral dissertation at University of Washington, June 1994

DOS version with no graphics displays. Output is text only.

Works on monochrome single images.

Version 2.0

C++ version for UW short course in C++, Autumn 1998

Latest update: December 7, 1998

Windows version with graphics overlays to show where features were found.

Uses Microsoft Vision SDK version 1.0

Version 2.1

Latest update: Feb 27, 2000

Added the capability of generating test images to verify performance on known cases.

Expanded the output file.

Research plan was only partially completed.

Uses Microsoft Vision SDK version 1.2

Version 3.0

July 9, 2003

Faster routines. Can handle color images. First cut at corner detection. Uses lateral inhibition / facilitation to build extended edges. See below for details.

Version 3.1 (Sept 2003)

Objectives: Link the features to produce a segmentation of the image. Output should be Hermite curves that outline objects. Be able to go from an image to a cartoon.

Expand to handle a color stereo video image stream.

Instead of a fixed sampling grid, adapt the sampling grid based on features found in the previous frame.

Version 4.0 (August 2005)

Objective:  Real-time binocular stereo on large images as the primary obstacle detection for a robot vehicle travelling offroad with no other traffic at up to 100 km/h.

Not in public domain.  Select individuals will be given alpha source code for non-commercial purposes.

No MFC, no GUI, translated from C++ to C.  Should run on any processor with enough power.  Inputs a .PGM image pair and outputs a DirectX compatible line strip of vertices. As debug, also outputs a color coded-distance map as a PPM image and intermediate text files. Input image pair is monochrome. The only directions detected are horizontal and vertical.  No bars; all features are edges or corners.

Overview

For an introduction, see the Software Design Document for Version 2.0 and http://home.earthlink.net/~tylerfolsom/

The version 2 and 3 code is written using Microsoft Foundation Classes (MFC) document / view architecture. There are several wizard generated files such as { MainFrm.cpp, ChildFrm.cpp, StdAfx.cpp, ImageFeatures.cpp, and their header files} which do not need to be disturbed and are of little interest. These MFC classes are derived from window classes. The view class is meant to handle the graphical user interface (GUI). The document class is meant to handle the real work. The header files carry comments on usage for class methods and variables. The files are organized as follows:

ImageFeatureView: Handles the GUI

Options: Gets items requested from the Options menu.

ImageFeaturesDoc: Handler for trivial tasks requested from the GUI.

ProcessFeatures: The main routine. It sets up the grid that determines the locations at which the image is sampled

IFFilter: The filter class. It takes as input a patch of image and correlates it with cortical filters.

IFLocation: Holds the result of the correlation at each location. Finds the orientations.

IFFeature: The feature class. It takes as inputs the results from applying cortical filters to a patch of image. It does 1D processing based on knowing the orientation. As outputs, it produces the interpretation of what features are at that location.

TestImage: Routines to generate synthetic images for testing. (incomplete) These are on the Process | Test cases menu.

Version 4  Changes from version 3: ( October, 2004 – August 2005)

Does binocular stereo. Translated from C++ to C and intended to run as an embedded application.  No MFC, no GUI, nothing device specific other than camera interface.  If not connected to a camera, input is a monochrome image pair in PGM.  Output is a 3D line strip of vertices in meters.  This may be extended to a triangle strip or fan describing a surface.

Has been tested using a Silicon Imaging SI-1280F camera.  This is a monochrome camera using a 1280 x 1022 format and capable of producing 50 frames per second continuously.  It can operate at 3000 frames per second on a 100 x 100 pixel subimage.  Shutter can be either rolling or triggered.  The camera can produce 12-bit images, but we have used 8-bit.  A color camera is available, but we have used monochrome for improved resolution.

We have used mirrors to produce a stereo pair on a single camera.  See Gluckman and. Nayar.  The mirrors were not aligned perfectly, forcing us to crop each image to 1265 x 400.  We used up-down stereo.  It may be a better idea to put the camera on its side and generate two 1022 x 640 images.  The software works for either right-left or up-down stereo.

This version treats all features as edges; there are no bars.  It does not find the angle of the feature.  In version 3, angle finding took about a third of  the processing time.  It finds only horizontal and vertical edges.  Thus only four filters are needed.  If a feature has significant components in both orientations, it is classified as a corner.

Subsampling on a large filter has been properly tested.

Version 3.0 Changes from version 2.1: (May 16 - July 9, 2003)

Changed the orientation of filters. Previously, both even and odd filters shared a horizontal orientation. Now the common orientation is vertical. This is done to facilitate stereo vision.

Changed from using four odd filter orientations to two. Both versions use three orientations of even filters. The even filter has a "bump" window applied to the equation -502.48 x2 + 7.8287. The old odd filter used the equation -925.81 x3 + 97.7913 x. The new odd filter uses 72.0232 x. The theory of steerable filters says that you can interpolate the angle of any filter from a small number of sampled orientations. In the case of polynomials, the number of samples needed is one more than the degree of the polynomial. See the dissertation for details. The equations used in the code have a multiplier of 15.66 times the numbers in the dissertation. Code changes also involved new look-up tables to determine the edge or bar position and strength based on the response of the odd/even filters. This data was generated on a spreadsheet. New bar width data was also produced, but it needs more work. However, bar width code was changed to get the bar width from the edge position when possible.

Updated the rules used to determine whether a feature is an edge or a bar. The basic idea is to see where the large and small filters predict the feature will be. There are sets of predictions for edges, dark bars, and light bars. Only one of the three pairs of predictions can be consistent. That one is chosen. A supplementary rule is based on the phase from the even and odd filters: If the absolute value of the phase is less than 0.42, then it is a dark bar; if it is greater than 2.7 radians, then it is a light bar. Phases in those ranges should produce no response for the other two filters.

Wrote a new routine for finding the angle of maximum response. The previous code used a "solve" routine to find the angle that would give the maximum filter response (based on three even and four odd). It tried multiple starting points to find where the derivative of the total response is zero. This was replaced with a "solve_max" routine based on three even and two odd filters. It should use many fewer iterations than the previous version. It starts at four approximate solutions and finds the maximum, using the value of the response as well as first and second derivatives.

Added the ability to handle color images. If more than one band is present, the analysis is done on the band that produces the strongest response. This should work best on an image with a luminance band and two chrominance bands. Defining DO_COLORS in stdAfx.h determines whether images are processed in gray or in color.

Added the ability to subsample when doing a correlation using large filters. This has not been adequately tested.

The above changes were archived as version 3.0.0. Changes below are in version 3.0.1

The main reason for most of the above changes was to make the code run faster. To find out how fast it runs, profiling code was added. This is in files profile.h and profile.cpp. These can be deleted from the project with no ill effect, or equivalently, undefine PROFILE_ME in profile.h. Profiling causes information to be written to a profile.txt file. The software is taken from Greg Hjelstrom and Byon Garrabrant, "Real-Time Hierarchical Profiling" in Game Programming Gems 3, Charles River Media, 2002.

Also added is a Canny edge detector. This is in files canny.cpp, canny.h, cannyDlg.cpp and cannyDlg.h. These are not central to the software and may be omitted. Their purpose it to compare the time required by the Quadrature Disambiguation method against a standard method. The Canny code was written by J.R. Parker in Algorithms for Image Processing and Computer Vision. It was modified by Travis Udd to work with the Microsoft Vision SDK in November, 2000, then further modified by Tyler Folsom in 2003.

Profiling shows that the time required for processing depends strongly on the size of the smallest filter used. The following times are for the CProcessFeatures::Process method, which is the heart of Quadrature disambiguation. Times should be in ms, but the timing has not been independently calibrated. The times are for a release version of the code processing a gray level elephant image, with no subsampling.

Filter size

20

12

8

5

Time

19.2

33.4

61.7

149.1

The time to do Canny edge detection on this image is 41.2. Thus we have achieved speed comparable to the Canny edge detector. It remains to be shown that the results achieved are as good or better.

The time to process features is mostly split into two subtasks: Filter, which performs the correlations at selected areas of the image, and Interpret, which finds the angle of maximum response, steers to that angle, and determines feature position, type, and strength. In addition, the code spends time displaying graphics of the found edges and writing the answers to a file. The following table shows the time spent on these tasks relative to the time for processing the features.

Filter size

20

12

8

5

Filter

65%

59%

54%

51%

Interpret

33%

38%

40%

39%

Display

36%

23%

13%

7%

Write file

47%

48%

47%

46%

Compared to the version 2.1 code, the time required to process features is 45% when filter size is 20 and 81% when filter size is 8.

Fixed inconsistencies in the code about what is light or dark. Made the orientation and coordinate system consistent. The X-axis points right; Y-axis points down; positive rotation is clockwise. An orientation of 0 degrees corresponds to the vector (0,1) with dark in the region where x is negative and light for positive x.

Added lateral antagonism. During the initial pass, features that fall below threshold are retained. Their effective strength is increased or decreased based on neighboring features.

Added corner detection. After finding the main feature, steer to 90 degrees from it and look for another feature. If there is any, it is a 2D interest point (corner, X, T, high curvature, blob, etc.) If not, it is an edge or bar. [To do: this needs more work.]

Removed steering to +/- 45 degrees. These responses could be used to distinguish certain classes of interest points, but this check has never been implemented.

Changed "edges only" behavior. Previously, the code checked for edges and bars. If the "edges only" view was selected, display of bars was suppressed. Now, at all locations only edges are sought. [To do: in this case we don't need to look at large filters and can save time by omitting large filter correlations.]

Implemented test images for corners.

Corrected the answers generated for bar testing. Verified that noise-free artificial images of edges and bars are usually good to subpixel resolution.

Known problems (version 3):

Images of thin bars are often misidentified as edges with a large position error.

Bar width is unreliable and almost always estimated as too wide.

Corner detection sometimes winds up putting the perpendicular edge at the wrong end of the receptive field.

Graphics for drawing bars is sometimes incorrect.

Class structure (version 3)

The program executes the following transformations:

Image     -> big image region              -> filtered               -> feature

                -> small image region           -> filtered

feature list -> Image of features

The classes involved represent images and their subregions, sampling locations, filters, detected features, and graphical outlines of features.

 

In versions 1 and 2, there are three odd filters at 60 degree orientations and four even filters at 45 degree orientations.

The even and odd filters share a horizontal orientation.

In version 3, this has been changed so that even and odd filters share a vertical orientation. This is meant to make it easier to detect stereo disparity. Version 3 uses three odd filters at 60 degree orientations and two even filters at 90 degree orientations

 Version 4 uses only one size filter since it does not find bars.  There are a total of  four filters: even and odd at horizontal and vertical orientations.  It is in C, so there are no classes.

References

http://home.earthlink.net/~tylerfolsom/

T.C. Folsom, Neural Networks Modeling Cortical Cells for Machine Vision, Ph.D. Thesis, University of Washington, Seattle, Washington, 1994.

T.C. Folsom and R.B Pinter, "Primitive features from steering, quadrature and scale," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, November 1998, pp. 1161-1173

T.C. Folsom “Sparse Sampling for Robot Vision  IASTED Conference on Robotics and Applications, Honolulu, HI, August 2004.

Freeman, W. T., and E. H. Adelson "The Design and Use of Steerable Filters", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13, pp. 891-906, 1991.

Gluckman, Joshua and Sree K. Nayar, Rectified Catadioptic Stereo Sensors, CVPR 2000.

Hjelstrom, Greg and Byon Garrabrant, "Real-Time Hierarchical Profiling" in Game Programming Gems 3, Charles River Media, 2002.

Parker,  J.R. Algorithms for Image Processing and Computer Vision

Vision Technology Group, Microsoft Research , The Microsoft Vision SDK, Version 1.0, March 1998, http://www.research.microsoft.com/research/vision/

Back

Home

Next