Resources and Downloads

Download and access NLP data, corpora, tools and services

  • Retrieves lexical concepts from all lexicons that have the SearchTerm in its synset. It allows an ​authenticated user (application or end-user) to search the dictionaries for a term they provide. They can set the results page size and the search filter to search either for definitions, translations, synonyms or a combination of them Request API Token.

    Actors Authenticated user.
    URL schema https://{domain}/api/term/{term}/?type={filter-no}&page={page-no}&limit={pageSize}&apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
    • type: search filter value (1: translations only, 2: synonyms only, 3: definitions only, 4: translations and synonyms , 5: translations and definitions , 6: synonyms and definitions, 7: translation, synonyms and definitions).
    • page: number of results page.
    • limit: number of results per page.
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. The system checks the page size to be between 1 and 1000, and the search filter to be between 1 and 7.
    5. If so the system performs the required search query.
    6. Otherwise, the system returns (-4) error code.
    7. The system returns the JSON data object.
    Retrieved Data results JSON object (list of lexical concepts).

    Example: virus

    Retrieves a certain lexical concept from a lexicon, given its IDff Request API Token.

    Actors Authenticated user.
    URL schema https://{domain}/api/lexicalconcept/{id}?apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. The system performs the required search query.
    5. The system return the JSON data object.
    Retrieved Data results JSON object (one lexical concept).

    Example ID: 1520039900

  • Retrieve all concepts from the Arabic Ontology that have the SearchTerm in its synset Request API Token.

    Actors Authenticated user.
    URL schema https://{domain}/api/OntologyTermSearch/{term}?page={page-no}&limit={pageSize}&apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
    • page: number of results page.
    • limit: number of results per page.
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. The system checks the page size to be between 1 and 1000.
    5. If so, the system performs the required search query.
    6. Otherwise, the system returns (-4) error code.
    7. The system returns the JSON data object.
    Data results JSON object (list of ontology concepts).

    Example: virus

    Retrieves basic information about a given concept from the Arabic Ontology Request API Token.

    Actors Authenticated user.
    URLs schema https://{domain}/api/OntologyConcept/{conceptID}?apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. The system performs the required search query.
    5. The system return the JSON data object.
    Retrieved Data results JSON object (One concept from the Arabic Ontology).

    Example ID: 293572

    Retrieves subtypes of an ontology concept, given its ID Request API Token.

    Actors Authenticated user.
    URL schema https://{domain}/api/OntologyConceptSubtypes/{superId}?apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. The system performs the required search query.
    5. The system return the JSON data object.
    Retrieved Data results JSON object (list of ontology concepts).

    Example ID: 293572

    Retrieves all Arabic Ontology concepts that are part of a given ontology concept.

    Actors Authenticated user.
    URL schema https://{domain}/api/ConceptParts/{partOfID}?apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. The system performs the required search query.
    5. The system return the JSON data object.
    Retrieved Data results JSON object (list of ontology concepts).

    Example ID: 293121

    Retrieves all Arabic Ontology concepts that are instances of a given ontology concept.

    Actors Authenticated user.
    URL schema https://{domain}/api/ConceptInstances/{instanceOfID}?apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. The system performs the required search query.
    5. The system return the JSON data object.
    Retrieved Data results JSON object (list of ontology concepts).

    Example ID: 293121

  • Palestinian morphologically annotated corpus (Curras) with 56K tokens, and a newly annotated Lebanese corpus (Baladi) with 10K tokens. Each token is annotated with 16 different features.

    The four corpora consists of about (1.2 million tokens) that we collected from different social media platforms. The Yemeni corpus (~1.05M tokens) was collected automatically from Twitter, while the other three dialects (~50K tokens each) were manually collected from Facebook and YouTube. Each word in the four corpora was annotated with different morphological features.

    The Nabra corpora consists of about 60K words/tokens collected from social media posts, scripts of movies and series, lyrics of songs and local proverbs. Each word in the corpus was annotated with different morphological features, including (CODA, Prefixes, Stem, Suffixes, MSA lemma, Dialect Lemma, Gloss, Part-of-Speech, Gender, Number, and Aspect).

  • The dataset consists of 500 synsets from the 10K synsets in Arabic WordNet. For each synset, an Arabic candidate synonyms are extracted.
    The total number of candidate synonyms is 3K with a fuzziness value of each.

    Request API Token to access Synonyms Generator web service online


    Actors Authenticated user.
    URL schema https://{domain}/sina/v2/api/SynonymGenerator/?apikey={key}
    Pre-conditions The user has registered and provided their API Token.
    API Parameters
      Synset, lexicons, pos and level are received through the body
    1. Synset: mono/multilingual synset.
    2. Lexicons: Select one or more of these lexicons (AWN, مكنز بيرزيت, Princeton WordNet, ALECSO, Cairo Academy).
    3. POS: part of speech (noun, verb).
    4. Level: Level3 and Level4.
    5. Apikey: A key (provided offline) to access the API.
    Flow of events
    1. The system checks if the API Key (i.e., Token) is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. If so the system extracts the entities from text.
    5. Otherwise, the system returns (-4) error code.
    6. The system returns the results in the specified format.
    Retrieved Data Return the candidate synonyms with their fuzzy values.
  • Extract named entities from a given Arabic text. 22 types of entities are supported, which can be single or overlapping entities. Different output formats are supported.

    Request API Token to access Wojood web service online


    Actors Authenticated user.
    URL schema https://{domain}/sina/v2/api/wojood/?apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
      mode and sentence are received through the body
    1. mode: output format (1) JSON IBO format, (2) XML format, or (3) entities and their positions in JSON.
    2. text: arabic text
    3. apikey: a key (provided offline) to access the API.
    Flow of events
    1. 1. The system checks if the user is authenticated or not.
    2. 2. If not authenticated, the system returns (-3) error code in JSON format.
    3. 3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. 4. If so the system extracts the entities from text.
    5. 5. Otherwise, the system returns (-4) error code.
    6. 6. The system returns the results in the specified format.
    Retrieved Data returns the results in the specified format.
  • A relatively large dataset of context-gloss pairs, labeled with True/False, was developed for fine-tuning BERT for WSD. Read the article to learn more about this dataset.

    Request API Token to access WSD SALMA web service online


    Actors Authenticated user.
    URL schema https://{domain}/v2/api/SALMA/{text}?apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    The text must be in the http request body.
    API Parameters
    1. text: arabic text
    2. apikey: a key (provided offline) to access the API.
    Flow of events
    1. 1.The system checks if the user is authenticated or not.
    2. 2.If not authenticated, the system returns (-3) error code in JSON format.
    3. 3.If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. 4.If so the system Semantically Analyse the sentence.
    5. 5.Otherwise, the system returns (-4) error code.
    6. 7.The system returns the JSON data object.
    Retrieved Data results JSON object.
  • ArBanking77 consists of 31,404 (MSA and Palestinian dialect) that are manually Arabized and localized from the original English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (intents) including card arrival, card linking, exchange rate, and automatic top-up. A neural model based on AraBERT was fine-tuned on the ArBanking77 dataset (F1-score 92% for MSA, 90% for PAL).

  • The corpus includes about 16K tweets manually labeled with (abusive, hate, violence, pornographic, or non-offensive) in addition to Target, Topic, and Phrase. We fined-tuned 8 models (using HeBERT and AlphaBERT). The corpus and all models are open source:

  • Lemmatize every token in a given sentence. The lemma and POS of every token are retrieved Request API Token.

    Request API Token to access our ALMA lemmatizer web service online


    Actors Authenticated user.
    URL schema https://{domain}/v2/api/ALMADB/{text}?apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    The text must be in the http request body.
    API Parameters
    1. text: arabic text
    2. apikey: a key (provided offline) to access the API.
    Flow of events
    1. 1.The system checks if the user is authenticated or not.
    2. 2.If not authenticated, the system returns (-3) error code in JSON format.
    3. 3.If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. 4.If so the system lemmatizes the text.
    5. 5.Otherwise, the system returns (-4) error code.
    6. 6.The system returns the JSON data object.
    Retrieved Data results JSON object.
  • This web service computes whether two Arabic words are the same or not regardless of how they are diacritized, and returns “Same” or “Different”. The output also contains implication direction, distance, number of conflict diacritics, and other outputs. The direction (1,2,3) is to specify which word implies the other. Read more in this article. Request API Token

    Actors Any user.
    URL schema https://{domain}/api/Implication/{word1}/{word2}
    Pre-conditions None.
    API Parameters
    1. word1: arabic word
    2. word2: arabic word
    Flow of events The system returns the JSON data object.
    Retrieved Data results JSON object.

    Example ID:compare (فَعل) and (فَعَلَ)

    Takes a set of delimited Arabic words and smartly removes duplicates, regardless of how they are diacritized based on the selected parameters.

    Actors Any user.
    URL schema https://{domain}/api/DuplicateCleaner/{words}/{separator}/{parameters}
    Pre-conditions None.
    API Parameters
    1. words: Set of words
    2. separator: The character that separates the words
    3. parameters: Four variables that must be true or false: (1) Ignore diacritics on last letter, (2): Ignore Shadda on any letter, (3) Ignore Hamza (ء) on the first Letter, (4):Ignore (ال التعريف) from the beginning of the word
    Flow of events The system returns the JSON data object.
    Retrieved Data results JSON object.

    Example: (فعل | فعلَ)

    Takes two sets of words and outputs the union, intersection and similarity measure between them. The service is smart and can tolerate the same words with different diacritics based on the selected parameters.

    Actors Any user.
    URL schema https://{domain}/sina/v2/api/jaccard/
    Pre-conditions None.
    API Parameters
    1. string1: set of arabic words
    2. string2: set of arabic words
    3. ignoreAllDiacriticsButNotShadda: Flag to Ignore all diacritics but not shadda
    4. ignoreShaddaDiacritic: Ignore diacritics and Shadda
    5. delimiter: The character that separates the words
    6. selection: (i) jaccardAll, (ii) intersection, (iii) union, (iv) jaccardSimilarity
    Flow of events The system returns the JSON data object.
    Retrieved Data results JSON object.
  • Retrieves lemmas and its linguistic features from our lemma index, that have the SearchTerm. It allows an ​authenticated user (application or end-user) to search the lemma index. No filters can be applied in this service Request API Token.

    Actors Authenticated user.
    URL schema https://{domain}/api/LemmaSearch/{term}?apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. The system performs the required search query.
    5. The system return the JSON data object.
    Retrieved Data results JSON object (a list of morphological result).

    Example: اخذ

    Retrieves basic morphological analysis for the SearchTerm Request API Token.

    Actors Authenticated user.
    URL schema https://{domain}/api/sina-morphizer/{term}?lang={lan}&apikey={key}
    Pre-conditions The user has registered and provided their API Token.
    API Parameters
    • lang: a flag that can have only one of the values (dialect, MSA, all) which indicates whether to search for dialect words (dialect), MSA words (MSA) or all (all)
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. The system performs the required search query.
    5. The system return the JSON data object.
    Retrieved Data results JSON object (a list of morphological result).
  • Retrieve the terms (that are lexicon entries) that begin with a given string of characters Request API Token.

    Actors Authenticated user.
    URL schema https://{domain}/api/Autocomplete/{term}?limit={number}&apikey={key}
    Pre-conditions The user has registered and provided their API Key.
    API Parameters
    • limit: an integer count of how many terms the autocomplete should return.
    • apikey: a key (provided offline) to access the API.
    Flow of events
    1. The system checks if the user is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, the system logs the request.
    4. The system performs the required search query.
    5. The system return the JSON data object.
    Retrieved Data Results JSON object.

    Example: time

  • Details of error messages returned by the APIs.


    Error Code Error Message
    -1 User blocked, exceeded access limit
    -3 user is not authenticated
    -4 Incorrect API parameter value
    -5 No Data Records Found
    -6 Incorrect Data Value
    login-error {"error":"invalid_grant","error_description":"Bad credentials"}