2020 VizWiz Grand Challenge Workshop

A panel of eight images paired with captions.  The first row contains four images with the following captions: "A computer screen with a Windows message about Microsoft license terms", "A photo taken from a residential street in front of some homes with a stormy sky above", and "A blue sky with fluffy clouds, taken from a car while driving on the highway".  The second row contains four images with the following captions: "A hand holds up a can of Coors Light in front of an outdoor scene with a dog on a porch", "A digital thermometer resting on a wooden table, showing 38.5 degrees Celsius", "A Winnie The Pooh character high chair with a can of Yoohoo sitting on it in front of a white wall", and "A cup holder in a car holding loose change from Canada".


Our goal for this workshop is to educate researchers about the technological needs of people with vision impairments while empowering researchers to improve algorithms to meet these needs. A key component of this event will be to track progress on a new dataset challenge, where the task is to caption images taken by people who are blind. Winners of this challenge will receive awards sponsored by Microsoft. The second key component of this event will be a discussion about current research and application issues, including by invited speakers from both academia and industry who will share about their experiences in building today’s state-­of-the-­art assistive technologies as well as designing next-generation tools.

Important Dates

  • February: challenge submissions announced
  • Friday, April 24 Friday, May 22 [5:59pm Central Standard Time]: extended abstracts due
  • Monday, May 4 Friday, May 29 [5:59pm Central Standard Time]: notification to authors about decisions for extended abstracts
  • Friday, May 15 Monday, June 1 [5:59pm Central Standard Time]: challenge submissions due
  • Sunday, June 14: all-day workshop


We invite two types of submissions:

Challenge Submissions

We invite submissions of results from algorithms for the image captioning challenge task. We accept submissions for algorithms that are not published, currently under review, and already published. The teams with the top-performing submissions will be invited to give short talks during the workshop. The top three teams will receive financial awards sponsored by Microsoft:

      • 1rst place: $10,000 Microsoft Azure credit
      • 2nd place: $10,000 Microsoft Azure credit
      • 3rd place: $10,000 Microsoft Azure credit

Extended Abstracts

  • We invite submissions of extended abstracts on topics related to image captioning and assistive technologies for people with visual impairments. Papers must be at most two pages (with references) and follow the CVPR formatting guidelines using the provided author kit. Reviewing will be single-blind and accepted papers will be presented as posters. We accept submissions on work that is not published, currently under review, and already published. There will be no proceedings. Please send your extended abstracts to workshop@vizwiz.org.



Event is being held virtually.


  • 9:00-9:10am: Opening remarks by Danna Gurari (video)
  • 9:10-9:30am: Invited speaker Meredith Morris (video)
  • 9:30-9:50am: Invited speaker Anirudh Koul (video)
  • 9:50-10:10am: Invited speaker Chieko Asakawa
  • 10:10-10:30am: Invited speaker Shiri Azenkot (video)
  • 10:30-10:45am: Break
  • 10:45-11:30am: Live panel with invited speakers from the morning (video)
  • 11:30-12:30pm: Lunch break
  • 12:30-12:40pm: Overview of challenge, winner announcements, and analysis of results by Yinan Zhao (video)
  • 12:40-12:55pm: Talks by top-3 teams on the dataset challenge
    • 1st place: MMTeam (IBM) (video)
    • 2nd place: SRC-B_VCLab (Samsung Research China-Beijing) (video)
    • 3rd place: aburns (Boston University) (video)
  • 12:55-1:30pm: Poster session (For interactive Q&A with authors, please click the Q&A link for each paper listed in the Poster List below. CVPR 2020 registration required for Q&A.)
  • 1:30-2:30pm: Panel with blind technology advocates Cynthia Bennett, Chancey Fleet, and Venkatesh Potluri (video)
  • 2:30-2:50pm: Break
  • 2:50-3:10pm: Invited speaker Peter Anderson (video)
  • 3:10-3:30pm: Invited speaker Kate Saenko (video)
  • 3:30-3:45pm: Break
  • 3:45-4:30pm: Live panel with invited speakers from the afternoon (video)
  • 4:30-4:50pm: Open discussion
  • 4:50-5:00pm: Closing remarks by Danna Gurari (video)

Invited Speakers:

Portrait picture of Kate Saenko

Kate Saenko
Boston University, MIT-IBM Watson AI Lab

Portrait picture of Merrie Morris

Meredith Morris

Portrait picture of Chieko Asakawa

Chieko Asakawa
IBM, Carnegie Mellon University

Portrait picture of Shiri Azenkot

Shiri Azenkot
Cornell University

Portrait picture of Venkatesh Potluri

Venkatesh Potluri
University of Washington

Portfolio picture of Cynthia Bennett

Cynthia Bennett
Carnegie Mellon University, Apple

Portrait Picture of Chancey Fleet

Chancey Fleet
New York Public Library, Data and Society Research Institute  

Poster List:

  • Self-Critical Sequence Training for Image Captioning using Bayesian “baseline”
    Shashank Bujimalla*, Mahesh Subedar*, Omesh Tickoo
    paper | video
  • Uncertainty quantification in image captioning models
    Shashank Bujimalla*, Mahesh Subedar*, Omesh Tickoo
    paper | video
  • Hybrid Information of Transformer for Image Captioning
    Yuchen Ren, Ziqiang Chen, Jinyu Hu, Lei Chen
    paper | video
  • Japanese Coins and Banknotes Recognition for Visually Impaired People
    Huyen T. T. Bui, Man M. Ho, Xiao Peng, Jinjia Zhou
    paper | video
  • Vizwiz Image Captioning based on AoANet with Scene Graph
    Suwon Kim, HongYong Choi, JoongWon Hwang, JangYoung Song, SangRok Lee, TaeKang Woo
    paper | video
  • On the use of human reference data for evaluating automatic image descriptions
    Emiel van Miltenburg
    paper | video
  • Alleviating Noisy Data in Image Captioning with Cooperative Distillation
    Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff
    paper | video
  • Exploring Weaknesses of VQA Models through Attribution Driven Insights
    Shaunak Halbe
    paper | video


Portrait picture of Danna Gurari

Danna Gurari
University of Texas at Austin

Potrait picture of Jeffrey Bigham

Jeffrey Bigham
Carnegie Mellon University, Apple

Portrait picture of Merrie Morris

Meredith Morris

Portrait picture of Ed Cutrell

Ed Cutrell

Portrait picture of Abigale Stangl

Abigale Stangl
University of Texas at Austin

Portrait picture of Yinan Zhao

Yinan Zhao
University of Texas at Austin

Contact Us

For questions, comments, or feedback, please send them to Danna Gurari at danna.gurari@ischool.utexas.edu.


Logo for SIGACCESS, the special interest group for accessible computing
Logo for Microsoft