Object Localization

Locate Each Object in Images Taken by People Who Are Blind


We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments. Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects (e.g., found in 12.3% of our segmentations), it shows objects that occupy a much larger range of sizes relative to the images, and text is over five times more common in our objects (e.g., found in 22.4% of our segmentations).


The VizWiz-FewShot dataset includes:

  • 4,622 images
  • 9,861 annotated instances

You may download the individual sets of components listed below.

We are actively working on creating an API to make it more convenient to work in the Few-Shot setting with four-folds.

For now, you may load in annotations using the COCO API. Our dataset follows the COCO format.

We additionally include bounding box annotations for object detection and a flag for whether or not the instance includes text.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.


To be completed…

Contact Us

For any questions about the dataset and code, please send them to Alec Bell or Everley Tseng at Alexander.Bell-1@colorado.edu or Everley.Tseng@colorado.edu, respectively.

For other questions, comments, or feedback, please send them to Danna Gurari at danna.gurari@colorado.edu.