<h3>Task description</h3>

<p>The goal of this benchmark is to provide an open and interactive interface for AutoML system evaluation on a wide range of tasks and datasets. We design our benchmark for both Academic datasets and real-world industrial datasets in order to get a better understanding of the current state of AutoML system performance. This benchmark is extensible and will have additional dataset groups with their updated versions coming with respect to the benchmark roadmap.</p>

<p>Benchmark solutions are end-to-end AutoML systems suited for both automatically building ML models on a given dataset as well as using their best-fitted model for inference on test data for the given dataset. Solutions are sent to the automatic testing system and evaluated on groups of datasets (see Dataset section).</p>

<h3>Evaluation procedure</h3>

<p>Solution evaluation consists of 3 phases:</p>

<ol>
	<li>Check. The solution is evaluated on a single small dataset. This step is required to check if the solution is correct. Check phase evaluation provides a detailed log of errors for participants.</li>
	<li>Live Test. Solutions are evaluated on a small yet representative subset of datasets from each dataset group (OpenML, Finance, ODS). These datasets are evaluated at any given time and provide live feedback on the leaderboard. Participants are provided with scores and time consumption and each of these datasets as well as their total score on this Live Test on the leaderboard.</li>
	<li>Large Test. Solutions are evaluated on a complete set of datasets from each group on a monthly basis. This evaluation is resource-intensive and carried out with a monthly schedule. Participants are provided with scores and time consumption for each dataset as well as their total score on each dataset group on the Large Test leaderboard. Detailed scores on each dataset of each participant are also provided for further analysis by the community after every run of the Large Test.<br />
	Important: Participants must choose up to 2 of their submissions eligible for evaluation on the Large Test (same mechanic as choosing your final submissions on regular competitions).</li>
</ol>

<h3>Benchmark Roadmap</h3>

<ul>
	<li>October-November: making participants familiar with the submission system, evaluation on the open group of datasets (OpenML CC 18). Crowdsourcing ODS datasets.</li>
	<li>November: providing participants with the Finance group of datasets. First run of the Large Test. End of datasets crowdsourcing in 2021.</li>
	<li>December: complete runs on all groups of datasets, integration of benchmark into open AutoML course.</li>
</ul>

<h3>Metrics and scores</h3>

<p>The complete scoring process of each AutoML solution consists of the following 3 steps:<br />
<br />
Step 1. For every dataset group on each dataset evaluate respective metric_value on test data predictions:</p>

<ul>
	<li>Binary classification: ROC-AUC</li>
	<li>Multiclass classification: ROC-AUC (one-vs-all)</li>
	<li>Regression: RMSE</li>
</ul>

<p>Step 2. For each metric value on each dataset calculate its relative dataset_score compared to the metric value of linear baseline:<br />
<code>dataset_score = metric_value / metric_baseline</code></p>

<p>Step 3. For each dataset group calculate its group_score as the average dataset_score within this group. <code>total_score</code> is the average <code>dataset_score</code> across all datasets in the current benchmark.</p>

<h3>Resource constraints</h3>

<ul>
	<li>12Gb memory</li>
	<li>4 vCPU</li>
	<li>50Mb solution archive size</li>
	<li>5 minutes for each dataset for Live Test and 1 hour runtime per dataset for Large Test.</li>
</ul>

<p>Official support channel: <a href="https://opendatascience.slack.com/archives/C02M54WP9EU"><strong>#automl_benchmark</strong></a> in ODS.ai slack. If you are not registered, please<a href="https://ods.ai/join-community" rel="noopener noreferrer" target="_blank"> join the community</a>.</p>

Open AutoML benchmark in the form of a container-based competition on both Academic (OpenML CC18) and industrial (Finance and ODS) datasets.

Public

Private

Check

Baseline

Dataset contribution

User Agreement

<p>I accept <a href="https://storage.yandexcloud.net/datasouls-ods/CourseFest_1/Docs/UserAgreement.pdf" rel="noopener noreferrer" target="_blank">Terms of User Agreement and Privacy Policy</a></p>

Privacy Policy

<p>I give my <a href="https://storage.yandexcloud.net/datasouls-ods/CourseFest_1/Docs/Consent.pdf" rel="noopener noreferrer" target="_blank">consent</a> to &laquo;Sorevnovaniya Analiza Dannykh&raquo; LLC to process and transfer my personal data to the partners of the event for information interaction</p>

Consent

<p>I <a href="https://storage.yandexcloud.net/datasouls-ods/CourseFest_1/Docs/Email.pdf" rel="noopener noreferrer" target="_blank">agree</a> to receive newsletters from &laquo;Sorevnovaniya Analiza Dannykh&raquo; LLC</p>

<p>I am Interested in receiving job offers</p>

first_name

First name

last_name

Last name

country

Country (lat)

city

City (lat)

email

Email

slack_name

Your ODS Slack nickname

study_place

Study place

study_specialty

Study specialty

work_place

Work place

work_position

Work position

expectations

Expectations

interests

Interests

preferred_language

Preferred language

about

About

Registration application for Course Season 2021

AutoML benchmark

Container

<h3>Dataset groups</h3>

<p>In order to provide a realistic overview of AutoML system performance, yet be compatible with other major AutoML results, we design our benchmark around groups of datasets. We start with the following dataset groups:</p>

<ul>
	<li>OpenML CC18. A total of 36 datasets on binary and multiclass classification tasks.</li>
	<li>Finance datasets. To be released in October. A group of ~30 datasets on various industrial tasks that appear in the finance industry. All 3 major tasks: regression, binary and multiclass classification.</li>
	<li>ODS crowdsource. To be released in November. A group of ~40 datasets on various tasks from different industries and Data Science domains.</li>
</ul>

<p>&nbsp;</p>

<h3>Submission format</h3>

<p>Each solution is an archive with code that runs in the Docker container environment. Solution archives are submitted into the automatic testing system for evaluation.&nbsp;</p>

<p>Each solution receives the following information:</p>

<ul>
	<li><code>task_type</code>: &ldquo;binary&rdquo; for binary classification, &ldquo;multiclass&rdquo; for multiclass classification, or &ldquo;reg&rdquo; for regression</li>
	<li><code>train_data</code>: path to the training dataset</li>
	<li><code>test_data</code>: path to the test dataset, without the target variable</li>
	<li><code>output_path</code>: path where the system must save predictions on the test_data</li>
</ul>

<h3>Datasets for local testing</h3>

<ul>
	<li>dresses-sales: binary, target - &#39;Class&#39;</li>
	<li>internet-advertisements: binary, target - &#39;Class&#39;</li>
	<li>eucalyptus: <span class="pl-s">multiclass</span>, target - &#39;Utility&#39;</li>
	<li>bioresponse: binary, target - &#39;target&#39;</li>
</ul>

requirements.txt

Dockerfile

submit.zip

datasest for local testing

requirements.txt	1 MB
Dockerfile	1 MB
submit.zip	1 MB
datasest for local testing Several datasets for local model testing	3 MB

Materials (6 MB)

Dataset groups

Submission format

Datasets for local testing