Authors:
Fabio Clarizia
;
Francesco Colace
;
Massimo De Santo
and
Paolo Napoletano
Affiliation:
University of Salerno, Italy
Keyword(s):
Semantic index, Information Retrieval, Web Search Engine, Latent Dirichlet Allocation.
Related
Ontology
Subjects/Areas/Topics:
Biomedical Engineering
;
Cloud Computing
;
Data Engineering
;
Enterprise Information Systems
;
Health Information Systems
;
Human-Computer Interaction
;
Information Systems Analysis and Specification
;
Knowledge Management
;
Machine Perception: Vision, Speech, Other
;
Ontologies and the Semantic Web
;
Semantic Web Technologies
;
Services Science
;
Society, e-Business and e-Government
;
Software Agents and Internet Computing
;
Web Information Systems and Technologies
Abstract:
In this paper we address the problem of modeling large collections of data, namely web pages by exploiting jointly traditional information retrieval techniques with probabilistic ones in order to find semantic descriptions for the collections. This novel technique is embedded in a real Web Search Engine in order to provide semantics functionalities, as prediction of words related to a single term query. Experiments on different small domains (web repositories) are presented and discussed.