Paraphrase Detection on PAWS dataset

Introduction

Paraphrasing is a form of plagiarism that refers to use of other idea, words or work, and presenting it in different ways by switching words, changing sentence construction and changing grammar style. Additionally, paraphrasing may include replacing some words with synonyms [Chowdhury and Bhattacharyya, 2018]. In short, a sentence can be defined as a paraphrase of another sentence if they are not identical but share the same semantic meaning [Liu et al., 2022]. Large language models (LMMs) showed the high efficiency in paraphrasing tasks [Becker et al., 2023]. The use of LLMs may lead to an increase in paraphrasing, which can compromise the integrity of legal writing. Using Transformer based models for the classification seems to be intuitive to counteract this new form of plagiarism. [Wahle et al., 2021] And therefore, in this study, we provide a solution for paraphrase detection task using transformer-based neural models. We provide SOTA (as far as we know on PAWS-Wiki dataset) architecture, based on DeBERTav3 [He et al., 2021]

Dataset

In out work, we use the PAWS dataset. PAWS training data dramatically improves performance on challenging examples and makes models more robust to real world examples. [Zhang et al., 2019] The dataset consists of 2 parts: WiKi and QQP (Quora Question Pairs).

Examples are generated from controlled language models and back translation, and given five human ratings each in both phases. The main idea of the PAWS is generating adversarial examples to break NLP systems. Some examples from PAWS dataset.

non-paraphrase:

asking him for a passport to return to England through Scotland

and asked him for a passport to return to Scotland through England

paraphrase:

The NBA season of 1975 – 76 was the 30th season of the National Basketball Association

The 1975 – 76 season of the National Basketball Association was the 30th season of the NBA

Results

Architecture	F1	Recall	Precision	Accuracy	ROC AUC
Baseline (current SOTA)	0.943	0.956	0.93	-	-
Perceptron Pooler	0.943	0.940	0.946	0.950	0.984
Mean Max Pooler	0.946	0.959	0.934	0.952	0.986
Convolutional Pooler	0.948	0.955	0.942	0.954	0.988
Concatenate Pooler	0.950	0.965	0.935	0.955	0.985
LSTM Pooler	0.946	0.962	0.930	0.951	0.985
Concatenate + LSTM Pooler	0.948	0.959	0.937	0.954	0.985
Concatenate + PAWS QQP	0.950	0.970	0.930	0.954	0.986

Architecture

Recall

Precision

Accuracy

ROC AUC

Baseline (current SOTA)

0.943

0.956

0.93

Perceptron Pooler

0.943

0.940

0.946

0.950

0.984

Mean Max Pooler

0.946

0.959

0.934

0.952

0.986

Convolutional Pooler

0.948

0.955

0.942

0.954

0.988

Concatenate Pooler

0.950

0.965

0.935

0.955

0.985

LSTM Pooler

0.946

0.962

0.930

0.951

0.985

Concatenate + LSTM Pooler

0.948

0.959

0.937

0.954

0.985

Concatenate + PAWS QQP

0.950

0.970

0.930

0.954

0.986

The results shows that adding Pooler Layers improves the performance of the model, and the best architecture is Concatenate Pooler. This could happened due to the fact that resulting hidden vector was large enough to capture all necessary information about the input sequence. Also, it seems that capturing information from all sequence does not improves performance as Concatenate + LSTM Pooler showed slightly lower performance than plain Concatenate Pooler. This may be caused by overfitting as well.

Unlabeled part of dataset did not boost performance of our models. We suppose, the main reason is that the unlabeled part is too noisy. This part of dataset may be used for pre-training, but not for finetuning. PAWS_QQP did increased the overall performance for both baseline and our solution. Therefore, the best combination of training sets are PAWS_Wiki and PAWS_QQP .

References

[Becker et al., 2023] Becker, J., Wahle, J. P., Ruas, T., and Gipp, B. (2023). Paraphrase detection: Human vs. machine content. arXiv preprint arXiv:2303.13989.

[Bojanowski et al., 2017] Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5:135–146.

[Chowdhury and Bhattacharyya, 2018] Chowdhury, H. A. and Bhattacharyya, D. K. (2018). Plagiarism: Taxonomy, tools and detection techniques. arXiv preprint arXiv:1801.06323.

[Clark et al., 2020] Clark, K., Luong, M.-T., Le, Q. V., and Manning, C. D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.

[Corbeil and Ghavidel, 2021] Corbeil, J.-P. and Ghavidel, H. A. (2021). Assessing the eligibility of backtranslated samples based on semantic similarity for the paraphrase identification task. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 301–308.

[Devlin et al., 2018] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[Gautam and Jerripothula, 2020] Gautam, A. and Jerripothula, K. R. (2020). Sgg: Spinbot, grammarly and glove based fake news detection. In 2020 IEEE Sixth international conference on multimedia big data (bigMM), pages 174–182. IEEE.

[He et al., 2021] He, P., Gao, J., and Chen, W. (2021). Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543.

[He et al., 2020] He, P., Liu, X., Gao, J., and Chen, W. (2020). Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.

[Kingma and Ba, 2014] Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy

Learn More

Paraphrase Detection on PAWS dataset
active,
Founded 2 years ago

Paraphrase Detection on PAWS dataset

Introduction

Dataset

Experiment

Metrics

Results

References

Paraphrase Detection on PAWS datasetactive, Founded 2 years ago

Paraphrase Detection on PAWS dataset

Introduction

Dataset

Experiment

Metrics

Results

References

Paraphrase Detection on PAWS dataset
active,
Founded 2 years ago