[vc_row][vc_column][vc_single_image image=”21″ img_size=”full” alignment=”center” el_class=”img-responsive”][/vc_column][/vc_row][vc_row][vc_column][vc_column_text]


We propose an artificial intelligence (AI) challenge to design algorithms that assist people who are blind to overcome their daily visual challenges. For this purpose, we introduce the VizWiz dataset, which originates from a natural visual question answering setting where blind people each took an image and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. Our proposed challenge addresses the following two tasks for this dataset: (1) predict the answer to a visual question and (2) predict whether a visual question cannot be answered. Ultimately, we hope this work will educate more people about the technological needs of blind people while providing an exciting new opportunity for researchers to develop assistive technologies that eliminate accessibility barriers for blind people.

[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column width=”1/2″][vc_column_text]


VizWiz v1.0 dataset download:

  • 20,000 training image/question pairs
  • 200,000 training answer/answer confidence pairs
  • 3,173 image/question pairs
  • 31,730 validation answer/answer confidence pairs
  • 8,000 image/question pairs
  • Python API to read and visualize the VizWiz dataset
  • Python challenge evaluation code

[/vc_column_text][/vc_column][vc_column width=”1/2″][vc_column_text]

The download file is organized as follows:

    • Visual questions are split into three JSON files: train, validation, and test. Answers are publicly shared for the train and validation splits and hidden for the test split.
    • APIs are provided to demonstrate how to parse the JSON files and evaluate methods against the ground truth.
    • Details about each visual question are in the following format:

“answerable”: 0,
“image”: “VizWiz_val_000000028000.jpg”,
“question”: “What is this?”
“answer_type”: “unanswerable”,
“answers”: [
{“answer”: “unanswerable”, “answer_confidence”: “yes”},
{“answer”: “chair”, “answer_confidence”: “yes”},
{“answer”: “unanswerable”, “answer_confidence”: “yes”},
{“answer”: “unanswerable”, “answer_confidence”: “no”},
{“answer”: “unanswerable”, “answer_confidence”: “yes”},
{“answer”: “text”, “answer_confidence”: “maybe”},
{“answer”: “unanswerable”, “answer_confidence”: “yes”},
{“answer”: “bottle”, “answer_confidence”: “yes”},
{“answer”: “unanswerable”, “answer_confidence”: “yes”},
{“answer”: “unanswerable”, “answer_confidence”: “yes”}

These files show two ways to assign answer type: train.jsonval.json. “answer_type” is the answer type for the most popular answer (used in VizWiz 1.0) and “answer_type_v2” is the most popular answer type for all answers’ answer types (used in VQA 2.0).