sinatools.morphology.morph_analyzer¶
-
sinatools.morphology.morph_analyzer.
analyze
(text, language='MSA', task='full',flag='1')¶ This method processes an input text and returns morphological analysis for each token within the text, based on the specified language, task, and flag. You can try the demo online
- The task is lemmatization, the morphological solution includes only the lemma_id, lemma, token, and token frequency.
- The task is pos, the morphological solution includes only the part-of-speech, token, and token frequency.
- The task is root, the morphological solution includes only the root, token, and token frequency.
- The task is full, the morphological solution includes the lemma_id, lemma, part-of-speech, root, token, and token frequency.
- Parameters
text (
str
) – The Arabic text to be morphologically analyzed.language (
str
) – The type of the input text. Currently, only Modern Standard Arabic (MSA) is supported.task (
str
) – The task to filter the results by. Options are [lemmatization, pos, root, full]. The default task if not specified is full.flag (
str
) – The flag to filter the returned results. If the flag is `1`, the solution with the highest frequency will be returned. If the flag is `*`, all solutions will be returned, ordered descendingly, with the highest frequency solution first. The default flag if not specified is `1`.
- Returns
- A list of JSON objects, where each JSON could be contains:
-
- token: The token from the original text.
- lemma: The lemma of the token.
- lemma_id: The id of the lemma.
- pos: The part-of-speech of the token.
- root: The root of the token.
- frequency: The frequency of the token.
- Return type
list (
list
)
Example:
from sinatools.morphology import morph_analyzer Return the morpological solution for each token in this text Example: task = full morph_analyzer.analyze('ذهب الولد الى المدرسة') [{ "token": "ذهب", "lemma": "ذَهَبَ", "lemma_id": "202001617", "root": "ذ ه ب", "pos": "فعل ماضي", "frequency": "82202" },{ "token": "الولد", "lemma": "وَلَدٌ", "lemma_id": "202003092", "root": "و ل د", "pos": "اسم", "frequency": "19066" },{ "token": "إلى", "lemma": "إِلَى", "lemma_id": "202000856", "root": "إ ل ى", "pos": "حرف جر", "frequency": "7367507" },{ "token": "المدرسة", "lemma": "مَدْرَسَةٌ", "lemma_id": "202002620", "root": "د ر س", "pos": "اسم", "frequency": "145285" }]