2022 VizWiz Grand Challenge Workshop


Our goal for this workshop is to educate researchers about the technological needs of people with vision impairments while empowering researchers to improve algorithms to meet these needs. A key component of this event will be to track progress on three dataset challenges, where the tasks are to answer visual questions and ground answers on images taken by people who are blind, and recognize objects in few-shot learning scenarios. Winners of these challenges will receive awards sponsored by Microsoft. The second key component of this event will be a discussion about current research and application issues, including by invited speakers from both academia and industry who will share about their experiences in building today’s state-­of-the-­art assistive technologies as well as designing next-generation tools.

Important Dates

  • Monday, February 7: challenge submissions announced
  • Friday, May 6 [9:00am Central Standard Time]: challenge submissions due
  • Saturday, May 21 [5:59pm Central Standard Time]: extended abstracts due
  • Friday, May 27 [5:59pm Central Standard Time]: notification to authors about decisions for extended abstracts
  • Monday, June 20: all-day workshop

This year, June 19 and 20 marks Juneteenth, a US holiday commemorating the end of slavery in the US, and a holiday of special significance in the US South. We encourage attendees to learn more about Juneteenth and its historical context, and to join the city of New Orleans in celebrating the Juneteenth holiday. You can find out more information about Juneteenth here: https://cvpr2022.thecvf.com/recognizing-juneteenth


We invite two types of submissions:

Challenge Submissions

We invite submissions of results from algorithms for three tasks: the visual question answering challenge task, the answer grounding challenge and the few-shot object recognition challenge. We accept submissions for algorithms that are not published, currently under review, and already published. The teams with the top-performing submissions will be invited to give short talks during the workshop. The top two teams for the visual question answering and answer grounding challenges will receive financial awards sponsored by Microsoft:

      • 1rst place: $10,000 Microsoft Azure credit
      • 2nd place: $5,000 Microsoft Azure credit

Extended Abstracts

We invite submissions of extended abstracts on topics related to image captioning, visual question answering, visual grounding and assistive technologies for people with visual impairments. Papers must be at most two pages (with references) and follow the CVPR formatting guidelines using the provided author kit. Reviewing will be single-blind and accepted papers will be presented as posters. We will accept submissions on work that is not published, currently under review, and already published. There will be no proceedings. Please send your extended abstracts to workshop@vizwiz.org.

Please note that we will require all camera-ready content to be accessible via a screen reader. Given that making accessible PDFs and presentations may be a new process for some authors, we will host training sessions beforehand to both educate and assist all authors to succeed in making their content accessible.



Event is being held in a hybrid manner. The portion before lunch will be held in-person (New Orleans Ernest N. Morial Convention Center; Room #236). The portion after lunch will be live-streamed at the following URL: https://vizwiz.org/video.


  • 9:00-9:05: opening remarks (video)
  • 9:05-9:20: announcement of challenge winners
    • VizWiz-VQA challenge (video)
    • VizWiz-VQA-Grounding challenge (video)
    • Few-shot object recognition challenge (video)
  • 9:20-9:35: VizWiz-VQA challenge winner talks
    • 1st place: Team HSSLAB_Inspur (video)
    • 2nd place: Team XDU_IPIU (video)
    • 3rd place: Team GroundTruth (video)
  • 9:35-9:50: VizWiz-VQA-Grounding challenge winner talks
    • 1st place: Team Aurora (video)
    • 2nd place: Team HSSLAB_Inspur (video)
    • 3rd place: Team MGTV (video)
  • 9:50-10:05: Few-shot object recognition challenge winner talks
    • 1st place: Team ORBITRON (video)
    • 1st place: Team Goose (video)
  • 10:05-10:15: poster spotlight talks
  • 10:15-11:00: poster session
  • 11:00-11:30: lunch break
  • 11:30-12:25: panel of blind technology advocates (Stephanie Enyart, Robin Christopherson, and Daniel Kish) (video) (transcript)
  • 12:30-1:25: panel of industry representatives (Saqib Shaikh, Will Butler, Karthik Kannan, and Anne Taylor) (video) (transcript)
  • 1:30-2:25: panel of computer vision researchers (Marcus Rohrbach,  Andrew Howard, and James Coughlan) (video) (transcript)
  • 2:30-3:25: interdisciplinary panel (Stephanie Enyart, Karthik Kannan, and James Coughlan) (video) (transcript)
  • 3:30-4:15: interdisciplinary panel (Will Butler, Andrew Howard, and Daniel Kish) (video) (transcript)
  • 4:20-5:00: interdisciplinary panel (Saqib Shaikh, Robin Christopherson, and Marcus Rohrbach) (video) (transcript)


The panel discussions are also available as a podcast and can be accessed on Spotify at the following URL: https://open.spotify.com/show/7AsiLuLq1Ay7QMBOUJNHfu

Invited Speakers and Panelists:

A photo of Stephanie, a woman with long brown hair wearing a grey suit, standing in front of the white house holding a white cane, looking directlynat the camera and smiling

Stephanie Enyart
Chief Public Policy & Research Officer, American Foundation for the Blind

A headshot of Andrew Howard, a man with short dark hair who is wearing glasses, a light shirt and is smiling looking straight at the camera

Andrew Howard
Google AI

A photo of Anne, a woman with straight black hair, wearing a blue shirt, standing in a classroom, looking directly at the camera smiling

Anne Taylor
Principal Program Manager, Microsoft

A headshot of James, a man with short brown hair wearing glasses and a dark shirt, standing outside, looking directly at the camera smiling

James Coughlan
Senior Scientist/ Coughlan Lab Director, Smith-Kettlewell

A headshot of Robin, a man with short brown hair, smiling

Robin Christopherson
Head of Digital Inclusion,  AbilityNet

A black and white headshot showing the profile of Daniel, a white man with dark hair, sitting outside and looking at the camera smiling

Daniel Kish
President, World Access for the Blind

A headshot of Will, a man with short hair and a beard, wearing a button-up shirt and glasses, looking upwards, smiling

Will Butler
Chief Experience Officer, Be My Eyes

A headshot of Saqib, a man with short dark hair swept back, wearing a checkered reand and blue button-down shirt, looking to the left of the camera smiling.

Saqib Shaikh
Software Engineering Manager, Microsoft

A headshot of Karthik, a man with short brown hair and a beard, wearing a dark blue sweater, looking directly at the camera smiling

Karthik Kannan
Founder and Chief Technology Officer, Envision

Poster List

  • Answer-Me: Multi-Task Open-Vocabulary Learning for Visual Question Answering
    AJ Piergiovanni, Wei Li, Weicheng Kuo, Mohammad Saffar, Fred Bertsch, and Anelia Angelova
    paper | video
  • Anomaly Detection for Visually Impaired People Using A 360 Degree Wearable Camera
    Dong-in Kim, and Jangwon Lee
    paper | poster | video
  • Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model
    Fabian Deuser, Konrad Habel, Philipp J. Rosch, and Norbert Oswald
    paper | poster | video
  • Photometric Enhancements to Improve Recognizability of Image Content
    Lauren Olson, Chandra Kambhamettu, and Kathleen McCoy
    paper | poster | video
  • Improving Descriptive Deficiencies with a Random Selection Loop for 3D Dense Captioning based on Point Clouds
    Shinko Hayashi, Zhiqiang Zhang, and Jinja Zhou
    paper | video
  • Tell Me the Evidence? Dual Visual-Linguistic Interaction for Answer Grounding
    Junwen Pan, Guanlin Chen, Yi Liu, Jiexiang Wang, Cheng Bian, Pengfei Zhu, Zhicheng Zhang
    paper | video
  • An End-to-end Vision-language Pre-Trainer for VizWiz-VQA
    Dongze Hao, Yonghua Pan, Fei Liu, Tongtian Yue, Xinxin Zhu, and Jing Liu
  • Question-Aware Vision Transformer for VQA Grounding Segmentation
    Zhenduo Zhang, Jingyu Liu, Sheng Chen
  • Answer Anchors for VizWiz Answer Grounding
    Rengang, Yaqian, Hongwei, Zhenhua, Baoyu, Runze, Xiaochuan


Danna Gurari
University of Colorado Boulder

Potrait picture of Jeffrey Bigham

Jeffrey Bigham
Carnegie Mellon University, Apple

A photo of Ed Cutrell, a man with closely-cropped gray hair and a short beard with mustache, photographed on the Microsoft campus in Redmond, Wash., Thursday, January 9, 2020. Ed is wearing a black turtleneck and rimless glasses and is looking at the camera smiling.(Photo by Dan DeLong)

Ed Cutrell

Portrait picture of Abigale Stangl, a woman with long red hair wearing a gray sweater and glasses looking straight at the camera and smiling

Abigale Stangl
University of Washington

A portrait of Samreen Anjum, a woman with long brown hair, wearing a gray sweater and a red scarf. She is looking at the camera and smiling.

Samreen Anjum
University of Colorado Boulder

Contact Us

For questions, comments, or feedback, please send them to Danna Gurari at danna.gurari@colorado.edu.


Logo for MicrosoftMicrosoft logo
SIG Access logo
Apple logo