I set a goal for myself this week to really release (i.e., not just put in a public GitHub repository) a library I've been working on for a while. It borrows and builds on a lot of other people's work (credits in the LICENSE).
For the layperson, this library lets a user detect and describe key points in a supplied image. One of the primary ways people often use these points is to compare different images.
Take these two images, for instance (borrowed from the VLFeat SIFT page), which show the same rooftop from two perspectives.
How would I try to estimate the relative position of the camera that took these two? What if I could find key points in both images and then match them together? Then, I should be able to compute a relative transformation between the two images.
The library I'm pulling together computes these points very quickly, but in order for others (and for me) to have confidence in it, I have to show not only how much faster it is than existing solutions, but also how accurate it is.
Here's a first pass. Blue circles are the values generated by VLFeat, a well-known good implementation, and yellow circles are mine. A lot of the values are pretty close, but not perfect. Why? This week, I'll find out.