To capture three-dimensional data from the world, we've relied on a technique called structured light. Roughly speaking, we shine a known pattern using an infrared projector and then capture an image using an infrared camera. Based on the distortions in the known pattern, we can infer what the three-dimensional shape in the image is. Cool, right?
But there are a lot of problems with structured light. For example, if there's a bright light shining on an object, we won't be able to pick up the infrared and we'll won't be able to see our pattern. Or if we have a reflective object, the infrared will bounce right off. Or if we want fine-grained detail. The list goes on.
For the Amazon Picking Challenge, we face all these problems, we we're testing out new camera setups.
This was version one of our stereo camera (fancy term that just means two-camera). The thinking goes if humans and animals don't need to shoot lasers out of their eyes to do the things they need to do, why do robots? Of course, it is quite challenging to infer how far away something is even if you have multiple views of the same things. The fact we still have laser range finders (e.g., what's on Google's self-driving car) and structured light sensors (e.g., what you'll find in consumer devices like the Microsoft Kinect) is a testament to this. But this area has seen a lot of progress, so we're off to try anyway.
Our first version kind of worked:
but notice a lot of missing information (the grey areas) where the algorithm can't figure out how far away something is. Take my shirt, for instance. It's grey (props if you can see the Floored logo!) and there are large patches where the color doesn't change much. It becomes very hard to tell which grey pixel from one camera matches which grey pixel from the other. Better lighting can help. There was another complication related to converting color images to grayscale images, but won't get into that now (has to do with this).
After improving our camera rig:
and having better lightening, we see that I show up much better:
Turning the camera to the Amazon Picking Challenge shelf:
(one of the original grayscale images for reference)
There's still a lot of missing information for a number of other reasons (why engineering a.k.a. fiddling) is really important! These results are promising, though, and we're going to continue working in this direction. We may end up combining this rig with the existing structured light setup to form one multi-headed camera that can take advantage of relative strengths.