As part of Wojood project , Ali Salhi presented an article “Enhancement Tools for Arabic Web Search: A Statistical Approach” at Innovations’11 titled “The Seventh International Conference in Innovation in Information Technology” , which was held in Abu Dhabi, United Arab Emirates , April 25-27,2011.
Paper Abstract —The Arabic web content is growing rapidly and the need for its efficient management is gaining importance and the morphological complexity of Arabic raises many challenges in this regard. This paper reports on some of our work aimed at designing text mining and query pre-processing tools that are able to efficiently process and search large quantities of Arabic web data. In our research we try to address the challenges Arabic poses for natural language processing (NLP) and information retrieval, root extraction, language detection, and Arabic query correction, suggestion and expansion. While not reported in detail here, we are also developing tools for automatic Arabic document categorization. All through, we employ a statistical/Corpus-based approach based on data obtained from a variety of sources. Based on corpus statistics we constructed databases of words and their frequencies as single, double and triple expressions and used that as the infrastructure for the well structured search aid tools that are able to handle the sophisticated nature of Arabic, and capable of being integrated into existing web search engines and document processing systems. We also utilize context analysis and spellchecking of the user queries to enable a more complete and efficient search. While the results reported here are promising, they must be viewed as work in progress, still in need of testing, refining, integration and deployment in real life settings.
Index Terms— Natural Language Processing, Information retrieval, Root extraction, Language detection, Arabic query correction.
For full paper access please check: