CogSci 2016
Philadelphia, PA, August 10th, 2016.
Physical & Social
Scene Understanding

Themes


Computer vision has made significant progress in locating and recognizing objects in real images. However, beyond the scope of this “what is where” challenge, it lacks the abilities to understand scenes characterizing human visual experience. The mission of this workshop is to (a) identify the key domains in which human visual perception and cognition outperform computer vision; (b) formalize the computational challenges in these domains; and (c) provide promising frameworks for solving these challenges by conducting cognitive science and computer vision studies.

Here we propose Functionality, Physics, Intentionality and Causality (FPIC) as four key domains beyond "what is where":

Functionality

What can you do with the tree trunk?

Physics

How likely is the stone balancing?

Intentionality

Why does the guy kick the door?

Causality

Who knocked down the domino?
The combination of these largely orthogonal dimensions can span a large space of image understanding.

Despite their apparent differences, these domains do connect with each other in ways that are theoretically important: (a) they usually don’t project onto explicit visual features; (b) existing computer vision algorithms are neither competent in these domains, nor (in most cases) applicable at all; and (c) human cognition is nevertheless highly efficient at these domains. Therefore, studying FPIC should significantly fill the gap between computer vision and human vision. On the one hand, human studies on FPIC-related topics can inspire the invention of novel, cognitively-motivated computer vision systems. On the other hand, state-of-the-art computer vision systems can expand the scope of cognitive sciences to address challenges in real scenes.

The introduction of FPIC will advance cognitive models in three aspects: (a) transfer learning. As higher-level representation, FPIC tends to be globally invariant across the entire human living space. Therefore, learning in one type of scenes can be transferred to novel situations; (b) small sample learning. Learning of FPIC, which is consistent and noise-free, is possible even without a wealth of previous experience or “big data”; and (c) bidirectional inference. Inference with FPIC requires the combination of top-down abstract knowledge and bottom-up visual patterns. The bidirectional processes can boost the performance of each other as a result.

Several key topics are:
- Physically grounded scene interpretation
- Causal model of vision and cognition
- Human-object-scene interaction
- Human-robot collaboration
- Reasoning about goals and intents of the agents in the scenes
- Top-down and Bottom-up inference algorithms
- Related topics in cognitive science and visual perception

In conjunction with CogSci 2016, our “Physical and Social Scene Understanding” workshop will bring together researchers from cognitive science, computer vision and robotics, to illuminate cognitively-motivated vision systems going beyond labeling “what is where” in an image. These systems work closely together to achieve a sophisticated and coherent understanding of scenes with respect to Functionality, Physics, Intentionality and Causality (FPIC). In effect, these systems are expected to answer an almost limitless range of questions about an image using a finite and general-purpose model. In the meanwhile, we also want to highlight that FPIC is never meant to be an exclusive set of scene understanding problems. We welcome the insights of scholars who share the same perspective but are working on different problems.

Schedule

Location: Room 122B



09:00 am Opening Remarks Organizers
09:10 am Reverse-Engineering Human Social Cognition for Building Human-Robot Collaboration with Bayesian Theory-of-Mind Yibiao Zhao

Max Kleiman-Weiner
MIT
09:50 am How Scenes Activate Social and Moral Norms:
Theory and Preliminary Results

Bertram F. Malle
Brown University
10:30 am Analogy and Qualitative Representations in Physical and Social Understanding
Kenneth D. Forbus
Northwestern University
11:10 am Comparing and integrating approaches for common-sense scene understanding: probabilistic programs, qualitative heuristics, and neural networks
Josh Tenenbaum
MIT
12:00 pm Lunch Break
01:30 pm Human Simulation, Personality, Stereotypes, and Narratives
Norman I. Badler
University of Pennsylvania
02:10 pm Integrating a Physics Engine with Deep Learning for Human-like Physical Scene Perception
Ilker Yildirim
MIT
02:50 pm 3D Deep Learning for Robot Perception
Jianxiong Xiao
Princeton
03:30 pm Panel Discussions

Organizers
Tao Gao

Computational Cognitive Science lab
MIT

Chenfanfu Jiang

Computer Graphics & Vision Laboratory
UCLA

Yixin Zhu

Center for Vision, Cognition, Learning&Autonomy
UCLA





Lap-Fai Yu

Graphics and Virtual Environments Lab
University of Massachusetts Boston

Yibiao Zhao

Computational Cognitive Science lab
MIT

Max Kleiman-Weiner

Computational Cognitive Science lab
MIT





Chris Baker

Computational Cognitive Science lab
MIT