Zero-Shot Image Classification

Identify what objects are present in the image.

Overview

Our goal is to improve upon the status quo for designing image classification models trained in one domain that perform well on images from another domain. Complementing existing work in robustness testing, we introduce the first test dataset for this purpose which comes from an authentic use case where photographers wanted to learn about the content in their images. We built a new test set using 8,900 images taken by people who are blind for which we collected metadata to indicate the presence versus absence of 200 ImageNet object categories. We call this dataset VizWiz-Classification.

VizWiz-Classification Dataset

The VizWiz-Classification dataset includes:

8,900 images

You may download the individual sets of components listed below.

train, validation, and test: raw images
annotations.json: including the list of categories and images of our dataset.

Example code is provided to demonstrate how to parse the JSON files and transform predictions to an accepted file for evaluation on the EvalAI server.

The download files are organized as follows:

JSON annotation record has the following format:

categories = [category]
category = {
    "id" : 291,
    "wordnet_id" : "n02129165"
}

images = [image]
image = "VizWiz_val_00005256.jpg"

This work is licensed under a Creative Commons Attribution 4.0 International License.

Challenge

Our proposed challenge is designed around the aforementioned VizWiz-Classification dataset.

Task

Given an image, the task is to return one object label that is observed in the image. For evaluation of the submissions, we consider a predicted label for an image correct if it is in the set of labels of the image. For example, if the labels of the image are “Studio couch” and “T-shirt”, a correct prediction can be either of them.

Submission Instructions

Sep 2025 update: Our evaluation servers are currently under transition maintenance. For offline benchmarking services, please read this page.

Evaluation Servers

Teams participating in the challenge must submit results for the test portion of the dataset to our evaluation servers, which are hosted on EvalAI (challenges 2024 and 2025 are deprecated). We created different partitions of the test dataset to support different evaluation purposes:

Test-dev: this partition consists of 2000 of test images and is available year-round. Each team can upload at most 10 submissions per day to receive evaluation results.
Test-standard: this partition is available to support algorithm evaluation year-round, and contains all 8,900 images in the test dataset. Each team can submit at most five results files and at most one result per day. Each team can choose to share their results publicly or keep them private. When shared publicly, the best scoring submitted result will be published on the public leaderboard and will be selected as the team’s final entry for the competition.

Uploading Submissions to Evaluation Servers

To submit results, each team will first need to create a single account on EvalAI. On the platform, then click on the “Submit” tab in EvalAI, select the submission phase (“test”), select the results file (i.e., zip file) to upload, fill in the required metadata about the method, and then click “Submit”. The evaluation server may take several minutes to process the results. To have the submission results appear on the public leaderboard, check the box under “Show on Leaderboard”.

To view the status of a submission, navigate on the EvalAI platform to the “My Submissions” tab and choose the phase to which the results file was uploaded (i.e., “test”). One of the following statuses should be shown: “Failed” or “Finished”. If the status is “Failed”, please check the “Stderr File” for the submission to troubleshoot. If the status is “Finished”, the evaluation successfully completed and the evaluation results can be downloaded. To do so, select “Result File” to retrieve the aggregated score for the submission phase used (i.e., “test”).

Submission Results Formats

Please submit a file containing a python dictionary. Keys of it should be the name of the image and the value should be the id of the category. Use the following JSON format to submit results for the task:

results = {result}
result = "VizWiz_val_00005256.jpg":719 # "image_name" : id of the category

Leaderboards

The Leaderboard page for the challenge can be found here.

Rules

Teams are allowed to use external data to train their algorithms. The exception is that teams are not allowed to use any annotations of other VizWiz datasets (e.g. VizWiz-Captions, VizWiz-VQA).
Members of the same team are not allowed to create multiple accounts for a single project to submit more than the maximum number of submissions permitted per team on the test-challenge and test-standard datasets. The only exception is if the person is part of a team that is publishing more than one paper describing unrelated methods.

Publication

A New Dataset Based on Images Taken by Blind People for Testing the Robustness of Image Classification Models Trained for ImageNet Categories
Reza Akbarian Bafghi, and Danna Gurari. CVPR, 2023.

Contact Us

For any questions, comments, or feedback, please send them to Reza Akbarian Bafghi at reza.akbarianbafghi@colorado.edu.