Paraphrase Detection on PAWS dataset
active,
Founded 21 months ago

This project proposes a SOTA solution to the problem of paraphrase identification on PAWS-Wiki test set. We used Concatenate Pooler with DeBERTa backbone trained on PAWS-Wiki and PAWS-QQP train sets to achieve F1=0.95: improve from 0.943 (previous SOTA). Also, we investigate the effects of unlabeled part of PAWS-Wiki.

paraphrase nlp

Paraphrase Detection on PAWS dataset

Code and full paper can be found here:

https://github.com/Sergey-Tkachenko/nlp_project_2023

Introduction

Paraphrasing is a form of plagiarism that refers to use of other idea, words or work, and presenting it in different ways by switching words, changing sentence construction and changing grammar style. Additionally, paraphrasing may include replacing some words with synonyms [Chowdhury and Bhattacharyya, 2018]. In short, a sentence can be defined as a paraphrase of another sentence if they are not identical but share the same semantic meaning [Liu et al., 2022]. Large language models (LMMs) showed the high efficiency in paraphrasing tasks [Becker et al., 2023]. The use of LLMs may lead to an increase in paraphrasing, which can compromise the integrity of legal writing. Using Transformer based models for the classification seems to be intuitive to counteract this new form of plagiarism. [Wahle et al., 2021] And therefore, in this study, we provide a solution for paraphrase detection task using transformer-based neural models. We provide SOTA (as far as we know on PAWS-Wiki dataset) architecture, based on DeBERTav3 [He et al., 2021]

Dataset

In out work, we use the PAWS dataset. PAWS training data dramatically improves performance on challenging examples and makes models more robust to real world examples. [Zhang et al., 2019] The dataset consists of 2 parts: WiKi and QQP (Quora Question Pairs).

Examples are generated from controlled language models and back translation, and given five human ratings each in both phases. The main idea of the PAWS is generating adversarial examples to break NLP systems. Some examples from PAWS dataset.

non-paraphrase:

  • asking him for a passport to return to England through Scotland
  • and asked him for a passport to return to Scotland through England

paraphrase:

  • The NBA season of 1975 – 76 was the 30th season of the National Basketball Association
  • The 1975 – 76 season of the National Basketball Association was the 30th season of the NBA

Experiment

In this work, we conducted two major experiments. The first experiment was designed to find the best effect of adding PAWSQQP and Unlabeled Wiki. The second major experiment was designed to determine the optimal architecture.

Metrics

We used standard binary classification metrics to evaluate our models:

  • Accuracy
  • Recall
  • Precision
  • F1 measure
  • ROC AUC score

Results

ArchitectureF1RecallPrecisionAccuracyROC AUC
Baseline (current SOTA)0.9430.9560.93--
Perceptron Pooler0.9430.9400.9460.9500.984
Mean Max Pooler0.9460.9590.9340.9520.986
Convolutional Pooler0.9480.9550.9420.9540.988
Concatenate Pooler0.9500.9650.9350.9550.985
LSTM Pooler0.9460.9620.9300.9510.985
Concatenate + LSTM Pooler0.9480.9590.9370.9540.985
Concatenate + PAWS QQP0.9500.9700.9300.9540.986

The results shows that adding Pooler Layers improves the performance of the model, and the best architecture is Concatenate Pooler. This could happened due to the fact that resulting hidden vector was large enough to capture all necessary information about the input sequence. Also, it seems that capturing information from all sequence does not improves performance as Concatenate + LSTM Pooler showed slightly lower performance than plain Concatenate Pooler. This may be caused by overfitting as well.

Unlabeled part of dataset did not boost performance of our models. We suppose, the main reason is that the unlabeled part is too noisy. This part of dataset may be used for pre-training, but not for finetuning. PAWS_QQP did increased the overall performance for both baseline and our solution. Therefore, the best combination of training sets are PAWS_Wiki and PAWS_QQP .

References

  • [Becker et al., 2023] Becker, J., Wahle, J. P., Ruas, T., and Gipp, B. (2023). Paraphrase detection: Human vs. machine content. arXiv preprint arXiv:2303.13989.
  • [Bojanowski et al., 2017] Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5:135–146.
  • [Chowdhury and Bhattacharyya, 2018] Chowdhury, H. A. and Bhattacharyya, D. K. (2018). Plagiarism: Taxonomy, tools and detection techniques. arXiv preprint arXiv:1801.06323.
  • [Clark et al., 2020] Clark, K., Luong, M.-T., Le, Q. V., and Manning, C. D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
  • [Corbeil and Ghavidel, 2021] Corbeil, J.-P. and Ghavidel, H. A. (2021). Assessing the eligibility of backtranslated samples based on semantic similarity for the paraphrase identification task. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 301–308.
  • [Devlin et al., 2018] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • [Gautam and Jerripothula, 2020] Gautam, A. and Jerripothula, K. R. (2020). Sgg: Spinbot, grammarly and glove based fake news detection. In 2020 IEEE Sixth international conference on multimedia big data (bigMM), pages 174–182. IEEE.
  • [He et al., 2021] He, P., Gao, J., and Chen, W. (2021). Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543.
  • [He et al., 2020] He, P., Liu, X., Gao, J., and Chen, W. (2020). Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
  • [Kingma and Ba, 2014] Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Cookies help us deliver our services. By using our services, you agree to our use of cookies.