Word-Duplicate Cleaner


This form takes a set of delimited Arabic words and smartly removes duplicates, regardless on how they are diacritized. The full details in this article.
Each line in the uploaded text file contains [SetID#SetOfWords], where SetID and SetOfWords delimited by # sign. Words in SetOfWords are separated by any character provided by the user in the form below

Input: Output:
Download

Words separated by

Advanced Options:
Ignore diacritics on last letter
Ignore Shadda ( ّ ) on any letter
Ignore Hamza (ء) on the first Letter
Ignore (ال التعريف) from the beginning of the word

Download: Source Code
Publication: Mustafa Jarrar, Fadi Zaraket, Rami Asia, Hamzeh Amayreh: Diacritic-Based Matching of Arabic Words. ACM Asian and Low-Resource Language Information Processing. Volume 18, No 2, Pages(10:1--10:21), ACM, December 2018. ISSN 2375-4699.
Other Projects: Implication Function , Arabic Ontology , Curras , Zinnar