Authors:
Khaled Nagi
and
Dalia Halim
Affiliation:
Alexandria University, Egypt
Keyword(s):
Faceted Search, Arabic Content on the Internet, Indexing Arabic Content.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Artificial Intelligence
;
Context
;
Domain Analysis and Modeling
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Natural Language Processing
;
Paradigm Trends
;
Pattern Recognition
;
Software Engineering
;
Symbolic Systems
Abstract:
Faceted search is becoming the standard searching method on modern web sites. To implement a faceted search system, a well defined metadata structure for the searched items must exist. Unfortunately, online text documents are simple plain text, usually without any metadata to describe their content. Taking advantage of external lexical hierarchies, a variety of methods for extracting plain and hierarchical facets from textual content are recently introduced. Meanwhile, the size of Arabic documents that can be accessed online is increasing every day. However, the Arabic language is not as established as the English language on the web. In our work, we introduce a faceted search system for unstructured Arabic text. Since the maturity of Arabic processing tools is not as high as the English ones, we try two methods for building the facets hierarchy for the Arabic terms. We then combine these methods into a hybrid one to get the best out of both approaches. We assess the three methods us
ing our prototype by searching in real-life articles extracted from two sources: the BBC Arabic edition website and the Arab Sciencepedia Website.
(More)