What Action Causes This? Towards Naive Physical Action-Effect Prediction
- This dataset contains action-effect information for 140 verb-noun pairs. It has two parts: effects described by
natural language, and effects depicted in images.
- The language data contains verb-noun pairs and
their effects described in natural language. For each verb-noun pair, its possible effects are described by 10 different annotators. The format for
each line is "verb noun, effect_sentence[, effect_phrase_1, effect_phrase_2, effect_phrase_3 ...]". Effect_phrases were automatically extracted from
their corresponding effect_sentences.
- The image data contains images depicting action
effects. For each verb-noun pair, an average of 15 positive images and 15 negative images were collected. Positive images are those deemed to capture
the resulting world state of the action. And negative images are those deemed to capture some state of the related object (i.e., the nouns in the
verb-noun pairs), but are not the resulting state of the corresponding action.