What Action Causes This? Towards Naive Physical Action-Effect Prediction
- This dataset contains action-effect information for 140 verb-noun pairs. It has two parts: effects described by natural language, and effects depicted in images.
- The language data contains verb-noun pairs and their effects described in natural language. For each verb-noun pair, its possible effects are described by 10 different annotators. The format for each line is "verb noun, effect_sentence[, effect_phrase_1, effect_phrase_2, effect_phrase_3 ...]". Effect_phrases were automatically extracted from their corresponding effect_sentences.
- The image data contains images depicting action effects. For each verb-noun pair, an average of 15 positive images and 15 negative images were collected. Positive images are those deemed to capture the resulting world state of the action. And negative images are those deemed to capture some state of the related object (i.e., the nouns in the verb-noun pairs), but are not the resulting state of the corresponding action.