Data — MTS Geohack 112

Call data

The zones.csv table contains squares with an approximate size of 500x500 meters each. All squares are located in Moscow, or at a short distance from Moscow. The square is defined by the coordinates of the lower-left corner (lat_bl, lon_bl) and the upper-right corner (lat_tr, lon_tr). There are coordinates of the center of the square in columns (lat_c, lon_c).

Target variable — the total number of calls to emergency services in each square in the period from 1^st September to 31^st December 2017.

The squares located in the western part of the sample are intended for training the model — the average number of emergency calls from the square per day is known for these squares:

calls_daily - for all days;

calls_workday - on business days;

calls_weekend - on weekends;

calls_wd{D} – on day of the week D (0-Monday, 6-Sunday).

Use the squares from the eastern part of the sample to predict the number of calls for all days of the week. The prediction quality will not be evaluated for all squares, but for a subset that does not include squares, with a low number of outgoing calls. A subset of target squares has is_target=1 in the table. For test squares, the calls_* and is_target values are hidden.

External data

We have prepared an example of using Open Street Map external data for participants (available below). We also recommend you to pay attention to the official Github competition repository, where you can find more detailed descriptions of data and examples of using it.

To make predictions, you can only use datasets from the regularly updated list that you can find on the forum. If you want to use a dataset that is not included in the list, post a link to this dataset on the forum, and the dataset will be added to the list.

The format of the solutions

As a solution, you have to provide a CSV table with predictions for all test squares, and for each square - for all days of the week. An example file with test predictions, sample_submission.csv is available below.

The quality is evaluated only by a subset of the target squares. Participants do not know which of the squares are targeted, but the way of selecting target squares in the training and test parts is identical.

During the competition, the quality is evaluated for 30% of the test target squares (selected randomly). At the end of the competition, the results are summed up for the remaining 70% of the squares.

The prediction quality metric is the Kendall rank correlation coefficient (Kendall's tau). This metric focuses on the order of the predicted number of calls, but not on their exact values. Different days of the week are considered independent elements of the sample, i.e., the correlation coefficient is calculated from the sample of all test pairs (zone_id, day of the week).

The testing system uses the implementation of Kendall's tau from the SciPy package: scipy.stats.kendalltau.

Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy

Learn More

Materials (0 MB)

Call data

External data

The format of the solutions