Resources
Demo and download our tools and datasests
An Arabic Wordnet with ontologically-clean content. Classification of the meanings of the Arabic terms, (see Article, see FAQ).
Actors | Authenticated user. |
URL schema | https://{domain}/api/OntologyTermSearch/{term}?page={page-no}&limit={pageSize}&apikey={key} |
Pre-conditions | The user has registered and provided their API Key. |
API Parameters |
|
Flow of events |
|
Data | results JSON object (list of ontology concepts). |
https://ontology.birzeit.edu/sina/api/OntologyTermSearch/virus/?page=1&limit=5&apikey=sampleKey
Actors | Authenticated user. |
URLs schema | https://{domain}/api/OntologyConcept/{conceptID}?apikey={key} |
Pre-conditions | The user has registered and provided their API Key. |
API Parameters |
|
Flow of events |
|
Retrieved Data | results JSON object (One concept from the Arabic Ontology). |
https://ontology.birzeit.edu/sina/api/OntologyConcept/293572?apikey=sampleKey
Actors | Authenticated user. |
URL schema | https://{domain}/api/OntologyConceptSubtypes/{superId}?apikey={key} |
Pre-conditions | The user has registered and provided their API Key. |
API Parameters |
|
Flow of events |
|
Retrieved Data | results JSON object (list of ontology concepts). |
https://ontology.birzeit.edu/sina/api/OntologyConceptSubtypes/293572?apikey=sampleKey
Actors | Authenticated user. |
URL schema | https://{domain}/api/ConceptParts/{partOfID}?apikey={key} |
Pre-conditions | The user has registered and provided their API Key. |
API Parameters |
|
Flow of events |
|
Retrieved Data | results JSON object (list of ontology concepts). |
https://ontology.birzeit.edu/sina/api/ConceptParts/293121?apikey=sampleKey
Actors | Authenticated user. |
URL schema | https://{domain}/api/ConceptInstances/{instanceOfID}?apikey={key} |
Pre-conditions | The user has registered and provided their API Key. |
API Parameters |
|
Flow of events |
|
Retrieved Data | results JSON object (list of ontology concepts). |
https://ontology.birzeit.edu/sina/api/ConceptInstances/293121?apikey=sampleKey
150 Arabic-Multilingual dictionaries were manually digitized, then structured and integrated in one database, including definitions, synonyms, translations, morphological features, etc.(See Article, See About)
Retrieves lexical concepts from all lexicons that have the SearchTerm in its synset. It allows an ​authenticated user (application or end-user) to search the dictionaries for a term they provide. They can set the results page size and the search filter to search either for definitions, translations, synonyms or a combination of them Request API Token.
Actors | Authenticated user. |
URL schema | https://{domain}/api/term/{term}/?type={filter-no}&page={page-no}&limit={pageSize}&apikey={key} |
Pre-conditions | The user has registered and provided their API Key. |
API Parameters |
|
Flow of events |
|
Retrieved Data | results JSON object (list of lexical concepts). |
https://ontology.birzeit.edu/sina/api/term/virus/?type=3&page=1&limit=10&apikey=sampleKey
Retrieves a certain lexical concept from a lexicon, given its IDff Request API Token.
Actors | Authenticated user. |
URL schema | https://{domain}/api/lexicalconcept/{id}?apikey={key} |
Pre-conditions | The user has registered and provided their API Key. |
API Parameters |
|
Flow of events |
|
Retrieved Data | results JSON object (one lexical concept). |
https://ontology.birzeit.edu/sina/api/lexicalconcept/1520039900?apikey=sampleKey
Four corpora consists of about (1.2 million tokens) that we collected from different social media platforms. The Yemeni corpus (~1.05M tokens) was collected automatically from Twitter, while the other three dialects (~50K tokens each) were manually collected from Facebook and YouTube. Each word in the four corpora was annotated with different morphological features. (See Article , see About).
Python APIs, command lines, colabs, and online demos.
Modules: Morphology Tagging, Named Entity Recognition, Word Sense Disambiguation, Relation Extraction, Semantic Relatedness, Synonyms, Diacritic-Based Matching, Corpora Processing, Utilities (See Article).
Pipeline: performs several task together. Given a sentence as input it tags all words with: Lemma, single-word sense, multi-word sense, and NER. The sense disambiguation is done using the ArabGlossBER TSV model using our single and multi word sense inventory (see Article). The lemmatization is done using Alma and the NER is done using Wojood.
ArabGlossBERT dataset: 167K context-gloss pairs, labeled with True/False, to train a TSV model (see Article). The dataset was also augmented with more pairs (See Article).
Salma Corpus: manually sense annotated corpus (34K tokens), (See Article).
Actors | Authenticated user. |
URL schema | https://{domain}/v2/api/SALMA/{text}?apikey={key} |
Pre-conditions |
The user has registered and provided their API Key. The text must be in the http request body. |
API Parameters |
|
Flow of events |
|
Retrieved Data | results JSON object. |
Models: Flat and Nested BERT models.
Wojood Corpus: 560K tokens (MSA and dialect), manually annotated with 21 entity types, covers multiple domains and was annotated with nested and flat entities (See Article).
WojoodFineCorpus: Same as Wojood but extended with subtypes of entities (51 tags in total),(See Article).
WojoodGazaCorpus: 60K tokens related to Israeli War on Gaza in domains (See Article).
Actors | Authenticated user. |
URL schema | https://{domain}/sina/v2/api/wojood/?apikey={key} |
Pre-conditions | The user has registered and provided their API Key. |
API Parameters |
|
Flow of events |
|
Retrieved Data | returns the results in the specified format. |
Extract relations between events and their arguments within a sentence (hasLocation, hasDate, hasAgent).
Corpus (WojoodHadath): We extended Wojood NER corpus with relations
Method: Novel method with using BERT with 95% accuracy, implemented as part of SinaTools (See Article)
The corpus includes about 16K tweets manually labeled with (abusive, hate, violence, pornographic, or non-offensive) in addition to Target, Topic, and Phrase. We fined-tuned 8 models (using HeBERT and AlphaBERT).
(See Article)
A corpora of 12,000 Facebook posts in five languages (Arabic, Hebrew, English, French, Hindi), with 2,400 posts in each language, manually annotated with Bias and Propaganda. This dataset was collected during the Israeli War on Gaza from October 7, 2023, to January 31, 2024.
(See Article)
A dataset consisting of 1,800 pairs of ChatGPT responses was created to analyze potential biases related to Palestine and Israel. The dataset encompasses the 30 articles of international human rights law, about 60 pairs for each article. Each pair was manually classified into one of three categories (Biased against Palestine, Biased against Israel, No Bias) by 12 well-trained law master’s students.
International Workshop on Nakba Narratives as Language Resources
Extend: Given a one or more synonyms the tool extends it with more synonyms.
Evaluate: Given a set of synonyms the tool evaluates how much these synonyms belong to this set. The tools is based on a novel algorithm and datasets, treating synonymy as a fuzzy relation. (See Article ).
Actors | Authenticated user. |
URL schema | https://{domain}/sina/v2/api/SynonymGenerator/?apikey={key} |
Pre-conditions | The user has registered and provided their API Token. |
API Parameters |
|
Flow of events |
|
Retrieved Data | Return the candidate synonyms with their fuzzy values. |
The dataset consists of 31,404 (MSA and Palestinian dialect). Each query is classified into one of the 77 classes (intents) including card arrival, card linking, exchange rate, etc. A set of BERT models are fine-tuned on the ArBanking77 dataset (F1-score 92% for MSA, 90% for PAL). (See Article )
Details of error messages returned by the APIs.
Error Code | Error Message |
-1 | User blocked, exceeded access limit |
-3 | user is not authenticated |
-4 | Incorrect API parameter value |
-5 | No Data Records Found |
-6 | Incorrect Data Value |
login-error | {"error":"invalid_grant","error_description":"Bad credentials"} |
Arabic Ontology An Arabic Wordnet with ontologically Clean Content.(CC-BY-4.0)
Qabas Lexicon Lexicographic database, 58K lemmas, linked with 110 lexicons and corpora 2M tokens.(CC-BY-ND-4.0)
Salma WSD Arabic sense-annotated corpus, 34k tokens. Multilevels: single-word, multi-word senses, and NER.(CC-BY-4.0)
Synonyms Synonyms dataset parallelly annotated by 4 linguists and fuzzy values.(CC-BY-4.0)
ArabGlossBERT 167K context-gloss pairs labeled with True/False to train a TSV BERT model for WSD.(CC-BY-4.0)
Quran Morphology tagging of the Quran, each word is linked with a lemma in Qabas lexicon.(CC-BY-4.0)
Curras Palestinian dialect corpus, 56K tokens with morphological annotations.(CC-BY-4.0)
Baladi Lebanese dialect corpus, 10K tokens with morphological annotations.(CC-BY-4.0)
Nabra Syrian dialect corpus, 60K tokens with morphological annotations.(CC-BY-4.0)
Lisan-Iraqi Iraqi dialect corpus, 45K tokens with morphological annotations.(CC-BY-4.0)
Lisan-Libyan Libyan dialect corpus, 51K tokens with morphological annotations.(CC-BY-4.0)
Lisan-Sudanese Sudanese dialect corpus, 52K tokens with morphological annotations.(CC-BY-4.0)
Lisan-Yemeni Yemeni dialect corpus, 1.05M tokens with morphological annotations(CC-BY-4.0)
Wojood NER Nested NER corpus, 550K tokens, 21 entity types, multidomains, MSA and dialects.(CC-BY-4.0)
WojoodFine Fine-grain NER corpus - extending Wojood with 31 entity subtypes.(CC-BY-4.0)
WojoodGaza NER corpus, 60K tokens about the Israeli War on Gaza, using Wojood guidelines.(CC-BY-4.0)
WojoodHadath Event-relation extraction corpus - extending Wojood with relations.(CC-BY-4.0)
ArBanking77 Parallel Corpora: 15K questions in MSA, Palestinian, Morocoo, Suadi, and Tunisian, each is labeled with a banking intent.(CC-BY-SA-4.0)
Offensive Hebrew 16K Tweets labeled with hate, violence, racism, porno.(CC-BY-4.0)
FigNews 12K FB posts annotated with Bias and Propaganda in Arabic, Hebrew, English, French, and Hindi.(CC-BY-4.0)