What is ASD QA?
Autism Spectrum Disorder QA, or ASD QA, is a project that supports the inclusion of people with special needs. First of all, it is a dataset for question-answering used for building an informational Russian language chatbot for the inclusion of people with autism spectrum disorder and Asperger syndrome in particular, based on the data from the informational website. The usage of the data is agreed.
The dataset is inspired by Stanford Question Answering Dataset (SQuAD), and was originally used to train a conceptual QA model for the chatbot for inclusive education. The dataset is available as an open source.
The dataset structure
The dataset includes sets of questions, answers and contexts with the relevant information for building retrieval-based QA systems. 5% of the questions are unanswerable and irrelevant, so the model can learn to ignore entertaining dialogue lines and give precise information only. The dataset contains 765 QA pairs, 18 894 tokens (words). The work is in progress, and the amount of QA pairs will increase.
Motivation and results
The inclusion of people with special needs becomes more widespread in Russia, although there is still lack of information and fake facts, which might cause misunderstandings and even conflicts between members of inclusive organizations (schools, colleges and universities, working organizations, etc.). Then, the idea of creating automated tools for the inclusion came up, but such tools need special closed domain datasets, and the work on such a dataset has soon been launched.
On the basis of the dataset, a closed domain model for question-answering in Russian was built with transfer learning techniques. Multilingual base BERT and 774M GPT-2 were fine-tined on the custom dataset to build retrieval based and generative QA models respectively.