Tswana VQA
Moloti Nakampe
Unknown
- 0 Collaborators
Visual Question Answering Multimodal Dataset for Southern Africa Languages. ...learn more
Project status: Under Development
oneAPI, Mobile, Internet of Things, Artificial Intelligence, Cloud
Intel Technologies
DevCloud,
Intel Opt ML/DL Framework,
MKL,
OpenVINO,
oneAPI
Overview / Usage
Before knowing how to speak, infants perceive the world by seeing, listening, and touching. This means language is not the only way to learn and communicate with the world. Therefore, we should substantially consider both language and other modalities for building artificial generic intelligence. This is called multi-modal learning, such a task is Visual Question Answering (VQA). Given an image and natural language question, VQA aims to generate the answer to the input question and depends on the deep understanding and sufficient interaction between the input question and image. While there is much research on VQA in English, there is a lack of datasets for African languages and English annotation is not directly applicable in African languages. We propose the development of the task of free-form and open-ended Setswana language Visual Question Answering (VQA) dataset. Given an image and a Setswana language question about the image, the task is to provide an accurate Setwanal language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context.
Methodology / Approach
At VizWiz VQA Challenge 2020: Answering Visual Questions from English speaking people initiated this effort by questioning how advanced technology can be used to contribute to a higher standard of human rights by providing an inclusive platform. Here, we extend its effort by inviting Tswana speaking people to create an African version of the VizWiz dataset. We collect the images taken by Tswana people and the corresponding questions reflecting its local context. We expect to collect a 200K dataset within 6 months and benchmark the dataset using one of state-of-the-art visual question answering models. We hope that this project will contribute to social welfare by providing an appropriate solution to make our society more equitable and sustainable for all. Initially, we shall start with the Setswana African Language and overtime add other South African Official languages.