Ended 19 months ago
138 participants
1628 submissions

Materials (15 MB)

Download all materials
train.csv
11 MB
test.csv
3 MB
sample_submission.csv
1 MB

The dataset presented here was collected from one of the most popular maps. It contains reviews and ratings from 1 to 5 and we suggest you try to predict them.

  • train.csv - The training set, comprising the rate and text of each review. rate comprise the target for the competition.
  • test.csv - For the test data we give only the text of a review.
  • sample_submission.csv - A submission file in the correct format.

You can download the dataset by following the link.

Evaluation

Submissions are scored using F1-score:

Submission File

For each row in the test set, you need to predict one of the 5 rates, from 1 to 5. The file should contain a header and have the following format:

index,rate
0,5
1,5
2,5
3,5
...

Usage

  1. Clonning repo: git clone https://github.com/e0xextazy/nlp_huawei_new2_task.git
  2. cd nlp_huawei_new2_task/
  3. Create virtual environment: python3.7 -m venv venv
  4. Activate virtual environment: source venv/bin/activate
  5. Setup your baseline:
    1. TF-IDF + Logistic Regression: ./setup/setup_tf_idf_logreg.sh
    2. Catboost: ./setup/setup_catboost.sh
    3. LSTM: ./setup/setup_lstm.sh
    4. Transformers: ./setup/setup_transformers.sh
  6. Download data: ./setup/download_data.sh
  7. Enjoy!

Authors

Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy