sinatools.morphology.morph_analyzer

sinatools.morphology.morph_analyzer.analyze(text, language='MSA', task='full',flag='1')

This method processes an input text and returns morphological analysis for each token within the text, based on the specified language, task, and flag. You can try the demo online

  • The task is lemmatization, the morphological solution includes only the lemma_id, lemma, token, and token frequency.
  • The task is pos, the morphological solution includes only the part-of-speech, token, and token frequency.
  • The task is root, the morphological solution includes only the root, token, and token frequency.
  • The task is full, the morphological solution includes the lemma_id, lemma, part-of-speech, root, token, and token frequency.
Parameters
  • text (str) – The Arabic text to be morphologically analyzed.

  • language (str) – The type of the input text. Currently, only Modern Standard Arabic (MSA) is supported.

  • task (str) – The task to filter the results by. Options are [lemmatization, pos, root, full]. The default task if not specified is full.

  • flag (str) – The flag to filter the returned results. If the flag is `1`, the solution with the highest frequency will be returned. If the flag is `*`, all solutions will be returned, ordered descendingly, with the highest frequency solution first. The default flag if not specified is `1`.

Returns

A list of JSON objects, where each JSON could be contains:
  • token: The token from the original text.
  • lemma: The lemma of the token.
  • lemma_id: The id of the lemma.
  • pos: The part-of-speech of the token.
  • root: The root of the token.
  • frequency: The frequency of the token.

Return type

list (list)

Example:

from sinatools.morphology import morph_analyzer

Return the morpological solution for each token in this text
Example: task = full
morph_analyzer.analyze('ذهب الولد الى المدرسة')

[{ 
    "token": "ذهب",
    "lemma": "ذَهَبَ",
    "lemma_id": "202001617",
    "root": "ذ ه ب",
    "pos": "فعل ماضي",
    "frequency": "82202"
  },{ 
    "token": "الولد",
    "lemma": "وَلَدٌ",
    "lemma_id": "202003092",
    "root": "و ل د",
    "pos": "اسم",
    "frequency": "19066"
  },{ 
    "token": "إلى",
    "lemma": "إِلَى",
    "lemma_id": "202000856",
    "root": "إ ل ى",
    "pos": "حرف جر",
    "frequency": "7367507"
  },{ 
    "token": "المدرسة",
    "lemma": "مَدْرَسَةٌ",
    "lemma_id": "202002620",
    "root": "د ر س",
    "pos": "اسم",
    "frequency": "145285"
}]