Abusive and Threatening Language Detection for Tweets in Urdu Subtask A: Abusive Language Detection

We invite participation in the competition for text classification of tweets in Urdu. The task is composed of two binary classification subtasks: (A) abusive intent; (B) threatening intent.

text classificationNLPUrdu languageabusive language detectionthreatening language detectiontweet classificationoffensive language detection

Task Description

This is Subtask A of a double-task competition!

With the growth of spread and importance of social media platforms, the effect of their misuse became more and more impactful. In particular, numerous posts contain abusive language towards certain users and hence worsen users’ experience from communication via such platforms, while other posts contain actual threats that potentially put platform users in danger. The Urdu language has more than 230 million speakers worldwide with vast representation on social networks and digital media.
We encourage participants to propose methods that can automatically detect threat and abuse in Urdu language to avoid violence and outrageous consequences.

The Task is divided into two Subtasks

Participants may choose to participate in either one or both subtasks. However, to be eligible for the ODS SoC Prize, the participant (participating team) is required to register and submit solutions for both subtasks. Subtask B (Threat detection) is given more weight than Subtask A (abusive language detection). Additionally, publication at FIRE 2021 is strongly encouraged.

Subtask A:

Subtask A focuses on detecting Abusive language using Twitter tweets in Urdu language. This is a binary classification task in which participating systems are required to classify tweets into two classes, namely: Abusive and Non-Abusive

Abusive - This Twitter post contains some sort of abusive content.
Non-Abusive - This Twitter post does not contain any abusive or profane content.

We followed Twitter definition to describe abusive comments towards an individual or groups to harass, intimidate, or silence someone else’s voice.

Subtask B is linked here

Technical Report Submission

After the result submission deadline, participants are invited to submit an abstract and a technical report paper with a brief description of their approach and experiments for publication in the FIRE 2021 Proceedings. All the working notes will be published in CEUR Workshop Proceedings. Please, see more information on the publication submission [here]

Timeline of the Competition and Beyond

July 23 – training and test data release; submission platform opens
August 6 - checkpoint at SoC
August 27 - result submission deadline
August 28 - results announced (for the private part of the test set)
September 3-5 - presentations at ODS Summer of Code 2021 festival
October 12 - paper (technical report) submission deadline for publication in Working Notes FIRE 2021 - OPTIONAL, encouraged! (there will be a workshop on academic paper writing)
October 26 - review notifications
November 2 – Camera Ready Due
16-20 December - FIRE 2021 (Online Event)

Contacts

Oxana Vitman oksana.vittmann@gmail.com Competition Information Coordinator (primary contact - all questions)
Alisa Zhila alisa.zhila@gmail.com Competition Organizer (secondary contact)
Maaz Amjad maazamjad@phystech.edu Dataset creator (questions on the collection procedure, data sources, etc.), representative of FIRE conference (questions on potential publications)
Alexander Gelbukh gelbukh@gelbukh.com Head of the Laboratory for Natural Language and Text Processing at CIC-IPN (questions on Master and PhD programs)
Grigori Sidorov sidorov@cic.ipn.mx Professor at the Laboratory for Natural Language and Text Processing of CIC-IPN (questions on Master and PhD programs)
ODS slack channel #proj_soc1_urdu

Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy

Learn More