ArBanking77

A dataset and source-code for ArBanking77
Version: 1.0 (updated on 1/9/2023)

ArBanking77 consists of 31,404 (MSA and Palestinian dialect) that are manually Arabized and localized from the original English Banking77 dataset; which consists of 13,083 queries. Each query is classified into one of the 77 classes (intents) including card arrival, card linking, exchange rate, and automatic top-up. A neural model based on AraBERT was fine-tuned on the ArBanking77 dataset (F1-score 92% for MSA, 90% for PAL). Try the service (type sentences seperated by newLine or ؟ or ? or ! or . ):

  • ArBanking77 is available to download upon request for academic and commercial use.
    Request to download ArBanking77 (whole dataset 31,404 queries, MSA 15,537 queries, Palestinian Dialect 15,867 queries)
    GitHub (download BERT training source code + sample data (~1K queries))
    Hugging Face (download fine-tuned BERT model, ready to use)

  • Request API Token to access ArBanking77 web service online

    Actors Authenticated user.
    URL schema https://{domain}/sina/v2/api/BankIntent/?apikey={key}
    Pre-conditions The user has registered and provided their API Token.
    API Parameters
      lang and text are received through the body
    1. lang: language.
    2. text: arabic or english text
    Flow of events
    1. The system checks if the API Key (i.e., Token) is authenticated or not.
    2. If not authenticated, the system returns (-3) error code in JSON format.
    3. If authenticated, and the access limit is not exceeded (if exceeded returns -1 in JSON format), then the system logs the request.
    4. If so the system extracts the entities from text.
    5. Otherwise, the system returns (-4) error code.
    6. The system returns the results in the specified format.
    Retrieved Data For each sentence, provide the ID, probability value, and corresponding intent in the JSON format.
  • Mustafa Jarrar, Ahmet Birim, Mohammed Khalilia, Mustafa Erden, Sana Ghanem: ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic. In Proceedings of the Arabic Natural Language Processing Conference (ArabicNLP 2023), Singapore. 2023

    PDF - Slides - Poster