sinatools.utils.jaccard¶
-
sinatools.utils.jaccard.
get_intersection
(list1, list2, ignore_all_diacratics_but_not_shadda=False, ignore_shadda_diacritic=False)¶ Get the intersection of two lists after normalization and ignoring diacratics based on input flags. You can try the demo online.
- Parameters
- Returns
The intersection of the two lists after normalization and ignoring diacratics.
- Return type
-
sinatools.utils.jaccard.
get_non_preferred_word
(word1, word2)¶ Returns the non-preferred word between the two input words.
-
sinatools.utils.jaccard.
get_preferred_word
(word1, word2)¶ Returns the preferred word among two given words based on their implication.
-
sinatools.utils.jaccard.
get_union
(list1, list2, ignore_all_diacratics_but_not_shadda, ignore_shadda_diacritic)¶ Finds the union of two lists by removing duplicates and normalizing words.
- Parameters
- Returns
The union of the two lists after removing duplicates and normalizing words.
- Return type
-
sinatools.utils.jaccard.
jaccard
(delimiter, str1, str2, selection, ignoreAlldiacraticsButNotShadda=True, ignoreShaddaDiacritic=True)¶ Compute the Jaccard similarity, union, or intersection of two sets of strings.
- Parameters
delimiter (
str
) – The delimiter used to split the input strings.str1 (
str
) – The first input string to compare.str2 (
str
) – The second input string to compare.selection (
str
) – The desired operation to perform on the two sets of strings. Must be one of intersection, union, jaccardSimilarity, or jaccardAll.ignoreAlldiacraticsButNotShadda (
bool
) – If True, ignore all diacratics except for the Shadda diacritic. (Defualt is True)ignoreShaddaDiacritic (
bool
) – If True, ignore the Shadda diacritic.(Default is True)
- Returns
The Jaccard similarity, union, or intersection of the two sets of strings, depending on the value of the selection argument.
-
sinatools.utils.jaccard.
jaccard_similarity
(list1: list, list2: list, ignore_all_diacratics_but_not_shadda: bool, ignore_shadda_diacritic: bool) ¶ Calculates the Jaccard similarity coefficient between two lists.
- Parameters
- Returns
The Jaccard similarity coefficient between the two lists.
- Return type