3d semantic representation of actions from effcient stereo-image-sequence segmentation on GPUs
Document typeConference report
Rights accessOpen Access
A novel real-time framework for model-free stereo-video segmentation and stereo-segment tracking is presented, combining real-time optical flow and stereo with image segmentation running separately on two GPUs. The stereosegment tracking algorithm achieves a frame rate of 23 Hz for regular videos with a frame size of 256 x 320 pixels and nearly real time for stereo videos. The computed stereo segments are used to construct 3D segment graphs, from which main graphs, representing a relevant change in the scene, are extracted, which allow us to represent a movie of e.g. 396 original frames by only 12 graphs, each containing only a small number of nodes, providing a condensed description of the scene while preserving data-intrinsic semantics. Using this method, human activities, e.g., handling of objects, can be encoded in an efficient way. The method has potential applications for manipulation action recognition and learning, and provides a vision-front end for applications in cognitive robotics.
CitationAlexey, A. [et al.]. 3d semantic representation of actions from effcient stereo-image-sequence segmentation on GPUs. A: International Symposium 3D Data Processing, Visualization and Transmission. "International Symposium 3D Data Processing, Visualization and Transmission (3DPVT) Edition 5th". Paris: 2010, p. 1-8.