Arabic Relation Extraction Tools, datasets and models

+-
Description

Method: We formulated the relation extraction task as a Natural Language Inference task and fine-tuned a BERT model using a large set of sentence pairs (NLI dataset) extracted from the Wojood^Hadath Corpus. (see article).

Wojood^Hadath Corpus: We extended the Wojood nested NER corpus (550K tokens), by manually annotating event entities with 3 relations.

Wojood^OutOdDomain Corpus: New corpus with 80K tokens in MSA, manually annotated with entities and relations as Wojood^Hadath. It covers 10 domains (Economics, Finance, Politics, Science, Technology, Art, Law, Agriculture, History, and Sports).

Relations:

has Agent: participant(s) involved in the event (Domain: Event, Range: PERS, ORG, OCC, NORP)
hasLocation: where the event occurred (Domain: event, Range: GPE, LOC, FAC)
hasDate: when the event occurred (Domain: event, Range: TIME, DATE)

Please email Prof. Jarrar (mjarrar AT birzeit.edu) for the annotation guidelines
+-
Downloads

SinaTools: Relation Extraction module as python library.

GitHub: training source code + sample data (~20 sentences with event mentions).

Hugging Face: fine-tuned BERT model using Wojood^Hadath.

Wojood^Hadath (Corpus only)

Wojood^OutOfDomain (Corpus only)
+-
Shared Task

Coming soon
+-
Publications

Alaa Aljabari, Lina Duaibes, Mustafa Jarrar, Mohammed Khalilia: Event-Arguments Extraction Corpus and Modeling using BERT for Arabic. In Proceedings of the Second Arabic Natural Language Processing Conference (ArabicNLP 2024), Bangkok, Thailand. Association for Computational Linguistics.