Forests and Trees: the Formal Semantics of Collective Categorization (ROCKY)


GeoRic Dataset

The GeoRic dataset contains images with the corresponding captions and image location coordinates (latitude and longitude). The dataset is intended for the use in image captioning and other vision and language tasks.

For more information about the GeoRic dataset and its application to training a geographically aware image captioning system see:
Nikiforova, S., Deoskar, T., Paperno, D., & Winter, Y. (2020). Geo Aware Image Caption Generation. To appear in Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020).


SapceNLI DataSet

The SpaceNLI dataset is a testbed for spatial reasoning. It represents a collection of natural language inference (NLI) problems that are automatically generated from a curated set of reasoning patterns. The patterns are created and three-way annotated by experts. They cover diverse types of spatial expressions and reasoning.

The following paper describes the creation process of SapceNLI and reports the results of evaluating state-of-the-art NLI models on it:

Abzianidze, L., Zwarts, J., Winter, Y. (2023). SpaceNLI: Evaluating the Consistency of Predicting Inferences In Space. In Proceedings of the 4th Natural Logic Meets Machine Learning Workshop (NALOMA IV).