<h3>Task Description &nbsp;&nbsp; &nbsp;</h3>

<p>This is <strong>Subtask B of a double-task competition</strong>!&nbsp;</p>

<p>With the growth of spread and importance of social media platforms, the effect of their misuse became more and more impactful. In particular, numerous posts contain abusive language towards certain users and hence worsen users&rsquo; experience from communication via such platforms, while other posts contain actual threats that potentially put platform users in danger. &nbsp;The Urdu language has more than 230 million speakers worldwide with vast representation on social networks and digital media.&nbsp;<br />
We encourage participants to propose methods that can automatically detect threat and abuse in Urdu language to avoid violence and outrageous consequences.</p>

<h3>The Task is divided into two Subtasks</h3>

<p>Participants may choose to participate in either one or both subtasks. However, <u>to be eligible for the ODS SoC Prize</u>, the participant (participating team) <strong>is required to register and submit solutions for both subtasks</strong>. Subtask B (Threat detection) is given more weight than Subtask A (abusive language detection). Additionally, publication at <a href="http://fire.irsi.res.in/fire/2021/hasoc">FIRE 2021</a>&nbsp;is strongly encouraged.&nbsp;</p>

<h3>Subtask B:</h3>

<p>Sub-task B focuses on detecting Threatening language using Twitter tweets in Urdu language. This is a binary classification task in which participating systems are required to classify tweets into two classes, namely: Threatening and Non-Threatening.</p>

<p><em>Threatening</em> - This Twitter post contains any threatening content.<br />
<em>Non-Threatening</em> - This Twitter post does not contain any threatening or profane content.</p>

<p>We followed <a href="http://fire.irsi.res.in/fire/2021/hasoc">Twitter&#39;s definition</a> to describe Threatening posts toward an individual or groups to threaten with violent acts, to kill or inflict serious physical harm, &nbsp;to intimidate, and to use violent language.&nbsp;</p>

<p>Subtask A is linked <a href="https://ods.ai/competitions/urdu-hack-soc2021">here</a></p>

<h3>Technical Report Submission</h3>

<p>After the result submission deadline, participants are invited to submit <strong>an abstract</strong> and <strong>a technical report</strong> paper with a brief description of their approach and experiments for publication in the FIRE 2021 Proceedings. &nbsp; All the working notes will be published in <a href="http://ceur-ws.org/"><strong>CEUR Workshop Proceedings</strong></a>. Please, see more information on the publication submission [<a href="http://www.urduthreat2021.cicling.org/home">here</a>]</p>

<h3>Timeline of the Competition and Beyond&nbsp;</h3>

<p>July 23 &nbsp;&ndash; training and test data release; submission platform opens<br />
August 6&nbsp;- checkpoint at SoC&nbsp;<br />
August 27 - result submission <strong>deadline&nbsp;</strong><br />
August 28 - results announced (for the private part of the test set)<br />
September 3-5 - presentations at ODS Summer of Code 2021 festival&nbsp;<br />
October 12 - paper (technical report) submission <strong>deadline </strong>for publication in Working Notes <a href="http://fire.irsi.res.in/fire/2021/hasoc">FIRE 2021</a>&nbsp;- OPTIONAL, encouraged! (there will be a workshop on academic paper writing)<br />
October 26 - review notifications<br />
November 2 &ndash; Camera Ready Due<br />
16-20 December - <a href="http://fire.irsi.res.in/fire/2021/hasoc">FIRE 2021</a>&nbsp;(Online Event)</p>

<h3>Contacts &nbsp;</h3>

<ul>
	<li>Oxana Vitman &nbsp;<a href="http://mailto:oksana.vittmann@gmail.com">oksana.vittmann@gmail.com</a> &nbsp;Competition Information Coordinator (primary contact - all questions)&nbsp;</li>
	<li>Alisa Zhila <a href="http://mailto:alisa.zhila@gmail.com">alisa.zhila@gmail.com</a> Competition Organizer (secondary contact)</li>
	<li>Maaz Amjad &nbsp;<a href="http://mailto:maazamjad@phystech.edu">maazamjad@phystech.edu</a> Dataset creator (questions on the collection procedure, data sources, etc.), representative of FIRE conference (questions on potential publications)</li>
	<li><a href="https://www.gelbukh.com">Alexander Gelbukh</a>&nbsp;<a href="http://mailto:gelbukh@gelbukh.com">gelbukh@gelbukh.com</a> &nbsp;Head of the <a href="https://nlp.cic.ipn.mx">Laboratory for Natural Language and Text Processing</a>&nbsp;at <a href="https://www.cic.ipn.mx/index.php/en/">CIC-IPN</a>&nbsp;(questions on Master and PhD programs)</li>
	<li><a href="http://www.cic.ipn.mx/~sidorov">Grigori Sidorov</a>&nbsp;<a href="http://mailto:sidorov@cic.ipn.mx">sidorov@cic.ipn.mx</a> Professor at the <a href="https://nlp.cic.ipn.mx">Laboratory for Natural Language and Text Processing</a>&nbsp;of <a href="https://www.cic.ipn.mx/index.php/en/">CIC-IPN</a>&nbsp; (questions on Master and PhD programs)</li>
	<li>ODS slack channel&nbsp;<a href="https://opendatascience.slack.com/archives/C027KDQ47MM"><strong>#proj_soc1_urdu</strong></a></li>
</ul>

We invite participation in the competition for text classification of tweets in Urdu. The task is composed of two binary classification subtasks: (A) abusive intent; (B) threatening intent.

Public Leaderboard

Private Leaderboard

User Agreement

I accept <a target="_blank" rel="noopener noreferrer" href="https://drive.google.com/file/d/1n-fPczc0u7vYGvOzUUGH55HCbuRlCp8x/view?usp=sharing">Terms of User Agreement</a>

Privacy Policy

I accept <a target="_blank" rel="noopener noreferrer" href="https://drive.google.com/file/d/1JifpdktW5xCMAYgxzomLhYQZuxSrp0Wi/view?usp=sharing">Privacy Policy</a>

Consent

I give my <a target="_blank" rel="noopener noreferrer" href="https://storage.yandexcloud.net/datasouls-ods/DataFest2021_pres/Consent.pdf">consent</a> to «Sorevnovaniya Analiza Dannykh» LLC to process and transfer my personal data (name, email, address) to the partners of the event for information interaction

first_name

First name

last_name

Last name

country

Country (lat)

city

City (lat)

email

Email

slack_name

Your ODS Slack nickname

study_place

Study place

study_specialty

Study specialty

work_place

Work place

work_position

Work position

expectations

Expectations

interests

Interests

preferred_language

Preferred language

about

About

Registration application for Data Fest & ODS Summer Of Code

Abusive and Threatening Language Detection for Tweets in Urdu Subtask B: Threat Detection

Predictions

<h3>Information on the Dataset for Subtask B</h3>

<p>The dataset was collected and annotated in <a href="https://nlp.cic.ipn.mx">Natural Language and Text Processing Laboratory</a>&nbsp;at <a href="https://www.cic.ipn.mx/index.php/en/">Center for Computing Research</a>&nbsp;of Instituto Polit&eacute;cnico Nacional, Mexico, by PhD candidate Maaz Amjad who is a native Urdu-speaker. Previously, Maaz obtained his Master degree from Moscow Institute for Physics and Technology (MIPT).&nbsp;</p>

<p><strong>Contacts about the dataset:&nbsp;</strong><br />
At the moment of competition, all questions about the dataset collection procedure should be addressed to Maaz <a href="http://mailto:maazamjad@phystech.edu">maazamjad@phystech.edu</a>. (The paper with the details on dataset collection and preprocessing procedures and other dataset statistics will be published &nbsp;at FIRE 2021).&nbsp;</p>

<p>The training-test split was performed by a co-advisor of Maaz&rsquo;s PhD thesis, &nbsp;Alisa Zhila (PhD). Please, address any questions on the data split to <a href="http://mailto:alisa.zhila@gmail.com">alisa.zhila@gmail.com&nbsp;</a></p>

train.csv

test.csv

submission_template.csv

Materials (3 MB)

Information on the Dataset for Subtask B