2021 VizWiz Grand Challenge Workshop

A panel of examples images for two tasks in the challenge: image captioning and visual question answering. On the left are shown nine images paired with captions for image captioning.  The first row contains three images with the following captions: "A computer screen with a Windows message about Microsoft license terms", "A can of green beans is sitting on a counter in a kitchen", "A photo taken from a residential street in front of some homes with a stormy sky above".  The second row contains three images with the following captions: "A hand holds up a can of Coors Light in front of an outdoor scene with a dog on a porch", "A digital thermometer resting on a wooden table, showing 38.5 degrees Celsius", "A close up of a Black and silver pocket Kershaw knife sits in a white persons open palm". The third row contains three images with the following captions:  "A blue sky with fluffy clouds, taken from a car while driving on the highway", "A baby chair that has cartoon characters on it with a can of yahoo on the table", and "A cup holder in a car holding loose change from Canada".

On the right are shown eight images paired with questions and answers for visual question answering. The first row contains four images with the following question and answer pair: 1. Question: Does this foundation have any sunscreen? Answer: yes; 2. Question: what is this? Answer: 10 euros; 3. Question: what color is this? Answer: green; 4. Question: Please can you tell me what this item is? Answer: butternut squash red pepper soup. The second row contains four images with the following question and answer pair: 1. Question: What type of pills are these? Answer: unsuitable image; 2. Question: What type of soup is this? Answer: unsuitable image; 3. Question: who is this mail for? Answer: unanswerable; 4. Question: when is the expiration date? Answer: unanswerable.


Our goal for this workshop is to educate researchers about the technological needs of people with vision impairments while empowering researchers to improve algorithms to meet these needs. A key component of this event will be to track progress on two dataset challenges, where the tasks are to answer visual questions and caption images taken by people who are blind. Winners of these challenges will receive awards sponsored by Microsoft. The second key component of this event will be a discussion about current research and application issues, including by invited speakers from both academia and industry who will share about their experiences in building today’s state-­of-the-­art assistive technologies as well as designing next-generation tools.

Important Dates

  • Monday, February 1: challenge submissions announced
  • Friday, May 21 [5:59pm Central Standard Time]: challenge submissions due
  • Friday, May 21 [5:59pm Central Standard Time]: extended abstracts due
  • Friday, May 28 [5:59pm Central Standard Time]: notification to authors about decisions for extended abstracts
  • Saturday, June 19: all-day workshop


We invite two types of submissions:

Challenge Submissions

We invite submissions of results from algorithms for both the image captioning challenge task and the visual question answering challenge task. We accept submissions for algorithms that are not published, currently under review, and already published. The teams with the top-performing submissions will be invited to give short talks during the workshop. The top two teams for each challenge will receive financial awards sponsored by Microsoft:

      • 1rst place: $10,000 Microsoft Azure credit
      • 2nd place: $5,000 Microsoft Azure credit

Extended Abstracts

We invite submissions of extended abstracts on topics related to image captioning, visual question answering, and assistive technologies for people with visual impairments. Papers must be at most two pages (with references) and follow the CVPR formatting guidelines using the provided author kit. Reviewing will be single-blind and accepted papers will be presented as posters. We will accept submissions on work that is not published, currently under review, and already published. There will be no proceedings. Please send your extended abstracts to workshop@vizwiz.org.

Please note that we will require all camera-ready content to be accessible via a screen reader. Given that making accessible PDFs and presentations may be a new process for some authors, we will host training sessions beforehand to both educate and assist all authors to succeed in making their content accessible. More details to come soon.



Event is being held virtually.


All the time below are in Central Time (CT)

  • 9:00-9:10am: Opening remarks
  • 9:10-9:30am: Invited talk
  • 9:30-9:50am: Invited talk
  • 9:50-10:10am: Invited talk
  • 10:10-10:30am: Break
  • 10:30-11:30am: Panel with blind technology advocates
  • 11:30am-12:30pm: Lunch break
  • 12:30-12:40pm: Overview of challenge, winner announcements, and analysis of results
  • 12:40-1:00pm: Talks by top-2 teams for both dataset challenges
  • 1:00-1:15pm: Poster spotlights
  • 1:15-2:00pm: Poster session
  • 2:00-2:30pm: Break
  • 2:30-2:50pm: Invited talk
  • 2:50-3:10pm: Invited talk
  • 3:10-3:30pm: Invited talk
  • 3:30-3:45pm: Break
  • 3:45-4:45pm: Live panel with invited speakers
  • 4:45-4:55pm: Open discussion
  • 4:55-5:00pm: Closing remarks

Invited Speakers and Panelists:

Portrait Picture of Dhruv Batra

Dhruv Batra
Georgia Tech, Facebook AI Research

Portrait picture of Anna Rohrbach

Anna Rohrbach
UC Berkeley

Portrait picture of Yue-Ting Siu

Yue-Ting Siu
San Francisco State University

Portrait picture of Daniela Massiceti

Daniela Massiceti
Microsoft Research

Portrait picture of Joshua Miele

Joshua Miele (tentatively)

Portrait picture of Peter Slatin

Peter Slatin
The Slatin Group

Portrait picture of Nefertiti Matos

Nefertiti Matos
New York Public Library


Portrait picture of Danna Gurari

Danna Gurari
University of Texas at Austin

Potrait picture of Jeffrey Bigham

Jeffrey Bigham
Carnegie Mellon University, Apple

Portrait picture of Merrie Morris

Meredith Morris
Google Research

Portrait picture of Ed Cutrell

Ed Cutrell

Portrait picture of Abigale Stangl

Abigale Stangl
University of Washington

Portrait picture of Yinan Zhao

Yinan Zhao
University of Texas at Austin

Portrait picture of Samreen Anjum

Samreen Anjum
University of Texas at Austin

Contact Us

For questions, comments, or feedback, please send them to Danna Gurari at danna.gurari@ischool.utexas.edu.


Logo for Microsoft