CLI.utils.jaccard_intersection

About:

The sina_intersection tool computes the intersection of two lists of strings after normalization and ignoring specific Arabic diacritics. It allows for ignoring all diacritics except shadda, or for ignoring shadda alone, or removing all diacritics. The tool is useful for comparing lists of Arabic words where diacritic variations need to be considered.

Usage:

Below is the usage information that can be generated by running sina_intersection –help.

Usage:

sina_intersection –list1=WORD1 WORD2 … –list2=WORD1 WORD2 … [OPTIONS]

Options:
–list1 WORD1 WORD2 …

First list of strings (space-separated).

–list2 WORD1 WORD2 …

Second list of strings (space-separated).

--ignore_all_diacritics_but_not_shadda

Ignore all Arabic diacritics but not shadda.

--ignore_shadda_diacritic

Ignore the shadda diacritic.

Examples

sina_intersection –list1 word1 word2 word3 –list2 word1 word4 word5

Note:

  • The lists are case-sensitive.

  • Diacritics refer to the Arabic Diacritics (like fatha, damma, kasra, etc.) and shadda.