train.csv | 3 MB | |
test.csv | 1 MB | |
sample_submission.csv | 1 MB |
The dataset presented here was collected from one of the public film rating resources. We have selected the 6 most popular movie genres and invite you to try to predict them.
train.csv
- The training set, comprising the movie_name
, movie_description
and target
of each film, the latter of which is the genre of the film. target
comprise the target for the competition. test.csv
- For the test data we give only the movie_description
of an film together with its movie_name
.sample_submission.csv
- A submission file in the correct format.You can download the dataset by following the link.
Submissions are scored using Accuracy error:
where N is the number of samples in the test dataset.
For each row in the test set, you need to predict one of the 6 movie genres. The file should contain a header and have the following format:
id,target
133530575988338041546938011932244933990,5
133530621940672299820253816187736128870,2
133530687700047186659654018829214907750,3
133531296172335296209766737246753488230,0
...
git clone https://github.com/e0xextazy/nlp_huawei_new2_task.git
cd nlp_huawei_new2_task/
python3.7 -m venv venv
source venv/bin/activate
./setup/setup_tf_idf_logreg.sh
./setup/setup_catboost.sh
./setup/setup_lstm.sh
./setup/setup_transformers.sh
./setup/download_data.sh
Copy of the contributing.md
.
How to make a pull request.
git checkout -b issue-id-short-name
;git push
;master
branch;Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy