People Watching: Human Actions as a Cue for Single-View Geometry




We present an approach which exploits the coupling between human actions and scene geometry. We investigate the use of human pose as a cue for single-view 3D scene understanding. Our method builds upon recent advances in still-image pose estimation to extract functional and geometric constraints about the scene. These constraints are then used to improve state-of-the-art single-view 3D scene understanding approaches. The proposed method is validated on a collection of monocular time lapse sequences collected from YouTube and a dataset of still images of indoor scenes. We demonstrate that observing people performing different actions can significantly improve estimates of 3D scene geometry.


ECCV Paper (pdf)
Slides (zip, 60MB)
Watch the talk from ECCV (at

David F. Fouhey, Vincent Delaitre, Abhinav Gupta, Alexei Efros, Ivan Laptev, Josef Sivic. People Watching: Human Actions as a Cue for Single-View Geometry. In Proc. 12th European Conference on Computer Vision. 2012.
[Show BibTex]

Extended Results

This video shows a selection of input timelapses and the evolution of functional surfaces and the resulting geometric interpretation.

Download (mp4, 18MB)

Timelapse Results Gallery (40 Sequences)

Still Results Gallery (100 Images)


Still image dataset (9.1MB ZIP file) - 100 JPGs

Videos (List of urls to videos used)

Related Works

V. Delaitre, D. Fouhey, I. Laptev, J. Sivic, A. Gupta, and A. Efros.
Scene Semantics from Long-term Observation of People.
In Proc. ECCV 2012.


This research is supported by:

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.