Ended 17 months ago
173 participants
1287 submissions

Materials (5 MB)

Download all materials
train.csv
3 MB
test.csv
1 MB
sample_submission.csv
1 MB

The dataset presented here was collected from one of the public film rating resources. We have selected the 6 most popular movie genres and invite you to try to predict them.

  • train.csv - The training set, comprising the movie_namemovie_description and target of each film, the latter of which is the genre of the film. target comprise the target for the competition. 
  • test.csv - For the test data we give only the movie_description of an film together with its movie_name.
  • sample_submission.csv - A submission file in the correct format.

You can download the dataset by following the link.

Evaluation

Submissions are scored using Accuracy error:

where N is the number of samples in the test dataset.

Submission File

For each row in the test set, you need to predict one of the 6 movie genres. The file should contain a header and have the following format:

id,target
133530575988338041546938011932244933990,5
133530621940672299820253816187736128870,2
133530687700047186659654018829214907750,3
133531296172335296209766737246753488230,0
...

Usage

  1. Clonning repo: git clone https://github.com/e0xextazy/nlp_huawei_new2_task.git
  2. cd nlp_huawei_new2_task/
  3. Create virtual environment: python3.7 -m venv venv
  4. Activate virtual environment: source venv/bin/activate
  5. Setup your baseline:
    1. TF-IDF + Logistic Regression: ./setup/setup_tf_idf_logreg.sh
    2. Catboost: ./setup/setup_catboost.sh
    3. LSTM: ./setup/setup_lstm.sh
    4. Transformers: ./setup/setup_transformers.sh
  6. Download data: ./setup/download_data.sh
  7. Enjoy!

Contributing

Copy of the contributing.md.

Issue

  • If you see an open issue and are willing to do it, add yourself to the performers and write about how much time it will take to fix it. See the pull request module below.
  • If you want to add something new or if you find a bug, you should start by creating a new issue and describing the problem/feature. Don't forget to include the appropriate labels.

Pull request

How to make a pull request.

  1. Clone the repository;
  2. Create a new branch, for example git checkout -b issue-id-short-name;
  3. Make changes to the code (make sure you are definitely working in the new branch);
  4. git push;
  5. Create a pull request to the master branch;
  6. Add a brief description of the work done;
  7. Expect comments from the authors.

Authors

Cookies help us deliver our services. By using our services, you agree to our use of cookies.