Locate Each Object in Images Taken by People Who Are Blind
We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments. Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects (e.g., found in 12.3% of our segmentations), it shows objects that occupy a much larger range of sizes relative to the images, and text is over five times more common in our objects (e.g., found in 22.4% of our segmentations).
The VizWiz-FewShot dataset includes:
- 4,622 images
- 9,861 annotated instances
You may download the individual sets of components listed below.
- train and validation: raw images
- annotations: instance masks
- annotations.json: COCO-format annotations file
We’ve developed a convenience API for loading VizWiz-FewShot annotations for training and evaluation available here. Our dataset generally follows the COCO format.
We additionally include bounding box annotations for object detection and a flag for whether or not the instance includes text.
This work is licensed under a Creative Commons Attribution 4.0 International License.
VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments
Yu-Yun Tseng, Alexander Bell, and Danna Gurari. European Conference on Computer Vision (ECCV), 2022.
For any questions about the dataset and code, please send them to Alec Bell or Everley Tseng at Alexander.Bellfirstname.lastname@example.org or Everley.Tseng@colorado.edu, respectively.
For other questions, comments, or feedback, please send them to Danna Gurari at email@example.com.