We propose the first workshop challenge in the artificial intelligence community on data originating from blind people in order to encourage a larger community to collaborate on developing algorithms for assistive technologies. Our challenge is designed around a new visual question answering (VQA) dataset that consists of visual questions asked by blind people, who each took a picture using a mobile phone and recorded a spoken question, together with 10 crowdsourced answers. These visual questions came from over 11,000 blind people in real-world scenarios where the people were seeking to learn about the physical world around themselves. Our challenge addresses two tasks: (1) visual question answering and (2) predicting if a visual question is answerable. We hope this challenge will educate more people about the technological needs of blind people while providing an exciting new opportunity for researchers to develop assistive technologies that eliminate accessibility barriers for blind people.
More broadly, this workshop will promote greater interaction between the diversity of researchers and practitioners interested in developing accessible VQA technology. To foster a discussion of current research and application issues, we invited speakers from both academia and industry to share their experiences in building today’s state-of-the-art assistive technologies as well as designing next-generation tools. We hope for this workshop to connect the appropriate people in order to accelerate the conversion of cutting edge research into marketable products that assist blind people to overcome their daily visual challenges.
Friday, August 17, 2018 at 5:59pm CST: challenge submissions due Monday, August 20, 2018 at 5:59pm CST: extended abstracts due Monday, August 27, 2018: notification to authors about decisions for extended abstracts Friday, September 14, 2018: workshop (full-day) when challenge winners will be announced
We invite two types of submissions:
- We invite submissions of results from a single algorithm for each of the two challenge tasks. All information about both challenges and the submission process can be found at this link. We accept submissions for algorithms that are not published, currently under review, and already published. The teams with the top-performing submissions will be invited to give short talks during the workshop.
- We invite submissions of extended abstracts on topics related to visual question answering and assistive technologies for blind people. Papers must be at most two pages (with references) and follow the ECCV formatting guidelines using the provided author kit. Reviewing will be single-blind and accepted papers will be presented as posters. We accept submissions on work that is not published, currently under review, and already published. There will be no proceedings. Please send your extended abstracts to firstname.lastname@example.org.
Theresianum 601 in TU Munchen (Please note this is different from the main conference venue). More information about how to travel to this venue is provided at this link.
- 9:00-9:10 am: Opening remarks [slides]
- 9:10-9:30 am: Jeffrey Bigham – “VizWiz: From Visual Question Answering to Supporting Real-World Interactions” [slides]
- 9:30-9:50 am: Kris Kitani – “Wearable Sensing for Understanding, Forecasting and Assisting Human Activity” [slides]
- 9:50-10:10 am: Devi Parikh – “Forcing Vision and Language Models to Not Just Talk But Also Actually See” [slides]
- 10:10-10:30 am: Break
- 10:30-10:50 am: Overview of challenge, winner announcements, and analysis of results [slides]
- 10:50-11:20 am: Talks by challenge winners
- FAIR A-STAR: 1st Place for VQ Answerability & VQA Tasks [slides]
- PAS-D: 2nd Place for VQ Answerability Task & 3rd place for VQA Task [slides]
- SKTBrain-SNU: 2nd Place for VQA Task [slides]
- 11:20-12:30 pm: Poster session
- 12:30-1:45 pm: Lunch
- 1:45-2:05 pm: Saqib Shaikh – “Seeing AI: Leveraging Computer Vision to Empower the Blind Community”
- 2:05-2:25 pm: Yonatan Wexler – “OrCam: Life-Changing Wearable AI”
- 2:25-2:45 pm: Roberto Manduchi – “Finding and reading scene text without sight” [slides]
- 2:45-3:15 pm: Break
- 3:15-3:45 pm: Panel discussion
- 3:45-4:00 pm: Open discussion
- 4:00-4:10 pm: Closing remarks [slides]
- “Bilinear attention networks for VizWiz challenge.” Jin-Hwa Kim, Yongseok Choi, Sungeun Hong, Jaehyun Jun, and Byoung-Tak Zhang
- “Contextualized Bilinear Attention Network.” Gi-Cheon Kang, Seonil Son, and Byoung-Tak Zhang
- “When the Distribution Is the Answer: An Analysis of the Responses in VizWiz.” Denis Dushi, Sandro Pezzelle, Tassilo Klein, and Moin Nabi
For general questions, please review our FAQs page for answered questions and to post unanswered questions.
For other questions, comments, or feedback, please send them to Danna Gurari at email@example.com.