Using "two sets of sensors" (artificial vision) to reconstruct a real-world scene in virtual 3D is something that has been done over 10 years ago. Seems like Intel has essentially patented the idea of using something similar to predict potential collision hazards.
I can see why Intel would be interested in that: major CPU hog. Processing HD stereo-3D in real-time for spatial analysis and collision avoidance with high enough precision to finely control games and applications using air-gestures would use most of current CPUs' time. While Kinect proves the general concept, its motion detection is much too coarse and laggy for fine control.