aij2019-data-check.zip Test examples for training and debugging | 256,000 MB |
Examination test is passed to the solution in JSON format. Test consists of a set of question tasks, resource and time constraints and metainformation (like test language).
Each question task object in test contains the following fields:
text
- question task text. Suitable for markdown-style formatting. Inside text there can be links to attahcment files like graphic illustrations for the task.attachments
- set of attached files (with their id, mime-type).meta
- metainformation. Arbitrary key-value pairs available for solution and testing system. Used for providing structured information about task. Example: question source, originating exam topic.answer
-format description for expected answer type. Multiple question types are considered, each with their specific parameters and fields:
score
- maximum number of points for the task. Based on this field solutions can prioritise computational resources between tasks.1. Check-phase
Solution is evaluated on publicly available set of questions with known answers. This phase is important for testing solutions for potential errors and issues with evaluation system interaction. Evaluation result and stdout/stderr
output are fully available for the participant.
2. Public Test
Solution is evaluated on a hidden set of questions, available only for organisers. Tasks and answer options within tasks are randomly rearranged each evaluation.
3. Private Test
Solution is evaluated on the final set of questions. Results on the private test are the ones that determine competition winners.
Solution containers are isolated from outside world:no internet access, no communication between parties.
RAM: 16 Gb;
Maximal solution archive size: 20Gb;
Maximal Docker-image size (publicly available): 20Gb;
Time limit on solution initialization (before task inference): 10 minutes
This time is allocated for loading models into memory.
Time limit on providing answer for a single request: 30 minutes.
Each question task is evaluated by a metric which is relevant to this task type:
Total solution score is the sum of scores across all question tasks. Each task scores are transformed to 100-point system based on official task correspondance table.
Solution evaluation on essay tasks comprises of two stages: automatic scoring and manual human-expert assessment.
Automatic procedure evaluates basic surface-level indicators of the generated texts:
Automatic scoring is given straight away and is not the final score. It is a helpful utility for participants.
Manual essay assessment is carried out by professional experts who follow the official grading standards of exam essays.
Results of manual essay assessments are served to the competition leaderboard 1-2 times a week.
In case automatic scoring indicates that manual essay assessment would lead to 0 points, participant is informed about it and is proposed to prepare a new solution for human assessment.
Participants are provided with a fully functional baseline solution for this competition:
Models are provided as a technical example as well as for internal validation against stronger solutions of participants.
Baseline model for essay passes through formal evaluation criteria, but doesn't pass through meaningful human assesment grading.
Cookies help us deliver our services. By using our services, you agree to our use of cookies.