Object Localization

Locate Each Object in Images Taken by People Who Are Blind

Introductory image for the dataset, displaying categories common with prior work on the left and unique categories on the right.

Overview

We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments. Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects (e.g., found in 12.3% of our segmentations), it shows objects that occupy a much larger range of sizes relative to the images, and text is over five times more common in our objects (e.g., found in 22.4% of our segmentations).

Dataset

The VizWiz-FewShot dataset includes:

  • 4,622 images
  • 9,861 annotated instances

You may download the individual sets of components listed below.

We’ve developed a convenience API for loading VizWiz-FewShot annotations for training and evaluation available here. Our dataset generally follows the COCO format.

We additionally include bounding box annotations for object detection and a flag for whether or not the instance includes text.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publications

VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments
Yu-Yun Tseng, Alexander Bell, and Danna Gurari. European Conference on Computer Vision (ECCV), 2022.

Contact Us

For any questions about the dataset and code, please send them to Alec Bell or Everley Tseng at Alexander.Bell-1@colorado.edu or Everley.Tseng@colorado.edu, respectively.

For other questions, comments, or feedback, please send them to Danna Gurari at danna.gurari@colorado.edu.