FREQUENCY OF SENTENTIAL CONTEXTS VS. FREQUENCY
OF QUERY TERMS IN OPINION RETRIEVAL
Sylvester Olubolu Orimaye, Saadat M. Alhashmi
School of Information Technology, Monash University, Jalan Lagoon Selatan, Bandar Sunway, Malaysia
Siew Eu-Gene
School of Business, Monash University, Jalan Lagoon Selatan, Bandar Sunway, Malaysia
Keywords: Sentential, Frequency, Context, Query terms, Grammar-based, Opinion retrieval.
Abstract: Many opinion retrieval techniques use frequency of query terms as a measurement for detecting documents
that contain opinion. However, using frequency of query terms leads to bias in context-dependent opinion
retrieval such that all documents containing query terms are retrieved, regardless of contextual relevance to
the intent of the human seeking the opinion. This can be described as non-contextual relevance problem in
opinion retrieval systems such as Google Blogs Search and Technorati Blog Directory. Sentence-level
contextual understanding and grammatical dependencies need be considered to ensure documents retrieved
contain large proportion of textual contents that have the same underlying meaning with the given query
instead of using frequency of individual query terms. Thus, we present specific challenges with state-of-the-
art opinion retrieval techniques that rely on frequency of query terms and we propose a grammar-based
technique for efficient context-dependent opinion retrieval. We believe our proposed technique can solve
the non-contextual relevance problem common to opinion retrieval systems, and can be used for context-
dependent retrieval such as expert search systems, faceted-opinion retrieval, opinion trend analytic, and
personalized web search.
1 INTRODUCTION
Understanding and retrieving human’s contextual
opinion from subjective contributions (e.g. relevant
or non-relevant) is a complex process, especially
when it involves human contributors with diversified
styles of making opinionated contributions. By
contextual opinion we mean, opinions given by
some humans closely match the intent of another
human seeking such opinions, and by opinionated
contributions we mean, textual contents expressed as
a result of unique human’s perception (opinion)
about a particular topic. In this paper, the term
opinionated will be used quite broadly to mean
information that contains opinion.
Many opinion retrieval systems avoid
computational models that treat opinion as a process
of cognitive language understanding (Krahmer,
2010). Particularly, they rely on frequency of query
terms or probabilistic measures to detect opinion
(Hannah et al, 2007). However, little has been done
to show grammatical and contextual understanding
of each opinionated contribution (Pang and Lee,
2008), such that frequency of relevant sentence is
used to identify opinionated documents instead of
frequency of query terms. For example, opinions
given in blogs are very dynamic (Liu, 2010), and
each blog document may discuss different
opinionated topics. Therefore, the use of frequency
of query terms to identify opinion without
considering the context at which the query terms had
occurred may lead to bias in overall opinion score.
Given a particular document, individual query
terms may occur at different contexts, yet meet the
frequency threshold for a certain opinion target, thus
explicitly creates bias in the overall opinion
retrieved. We argue that frequency of query terms
can not imply subjectivity alone without knowing
the context at which query terms must be frequent
(Pang and Lee, 2008). For example, these two
sentences, “the fight for academic success” and “I
607
Olubolu Orimaye S., M. Alhashmi S. and Eu-Gene S..
FREQUENCY OF SENTENTIAL CONTEXTS VS. FREQUENCY OF QUERY TERMS IN OPINION RETRIEVAL.
DOI: 10.5220/0003401206070610
In Proceedings of the 7th International Conference on Web Information Systems and Technologies (WEBIST-2011), pages 607-610
ISBN: 978-989-8425-51-5
Copyright
c
2011 SCITEPRESS (Science and Technology Publications, Lda.)
will fight you to finish”, have regular occurrence of
the word “fight”, which may imply violence as an
opinion target after a certain frequency threshold,
whereas, the word “fight” has appeared in two
different contexts respectively. That is, the sentence
“the fight for academic success” may imply “passion
for academic excellence”, and the sentence “I will
fight you to finish” may imply “violence”.
Thus, we propose a grammar-based approach for
sentence-level contextual opinion retrieval using
Natural Language Processing (NLP) techniques such
as Categorial Combinatory Grammar (CCG)
(Baldridge and Kruijff, 2004). For example, CCG
analyzes each given sentence to show the underlying
grammatical dependencies, and it has a high
predictive power for understanding linguistic
meaning and interpretation.
2 PROBLEM DISCOURSE
For the purpose of this paper, we are interested in
showing specific challenges with many state-of-the-
art opinion retrieval techniques that rely on
frequency of query terms, and then present an
effective grammar-based technique that can
understand the context at which opinionated
information is needed and to be retrieved. We
present different instances whereby the use of
frequency of query terms can actually harm the
nature of opinionated documents to be retrieved. We
argue that opinions are context-dependant (Pang and
Lee, 2008) and we believe context-dependent
opinion cannot be achieved by using frequency of
query terms only.
2.1 Frequency of Terms
in Opinion Polarity Detection
Opinion polarity detection techniques have recorded
some level of success (
Siersdorfer et al, 2010).
However, some limitations call for more reliable
techniques for effective opinion retrieval. In opinion
polarity detection, specific keywords within a
document are labelled with a particular polarity (e.g.
positive or negative). However, determining an
effective way to differentiate between what is
positive and what is negative is still an unsolved
problem. For example in
Sarmento et al (2009), the
presence of ironical phrases and inverted polarity in
opinionated documents led to lower precision for
positive opinions with just 77% accuracy. As a result
of this, the choice of individual words for polarity
detection in opinionated documents is still a big
challenge.
Figure 1: Instance of ironical phrase “has his finger on the
pulse of reality” in an opinionated document.
2.2 Frequency of Terms
in Subjectivity Detection
Subjectivity detection shows if a sentence contains
opinion or not (Pang and Lee, 2008). For example,
in (Wei and Clement, 2006), Wikipedia knowledge-
base was used to identify subjective and objective
sentences within a document by considering
individual query terms that match Wikipedia
concepts. However, this technique encountered
multiple concepts problem as each query term may
return more than one concepts from Wikipedia. We
also believe this approach can be computationally
expensive as there will be need to iterate through all
concepts and articles relevant to each query term.
Moreover, the approach assume each query term
would always have a single concept, whereas, the
given query may sometimes include multi-concepts.
2.3 Frequency of Terms
in Lexicon-based Opinion Detection
Lexicon-based approaches consider domain-specific
evidences to form lexicons for opinion retrieval
(Ding et al, 2008). For generating lexicons,
individual opinionated keywords are selected from
each sentence in a document. However, we believe
opinionated words alone cannot completely and
independently express the overall opinion contained
in a document, without taking into consideration the
grammatical dependencies between words. We
argue that the lexicon-based approach is never
context-dependent as individual keywords in the
lexicon might have been selected from varying
grammatical contexts.
2.4 Frequency of Terms
in Probabilistic Opinion Detection
Some probabilistic model use general opinion
lexicon and proximity density information to
calculate probabilistic opinion score for individual
query terms (Gerani et al, 2010). We argue that
proximity of words to some of the query terms may
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
608
not necessarily reflect the context at which opinion
is required. In fact, such works assume a single
focus opinionated document whereby, opinionated
content in a document would explicitly describe the
opinion targets without diverging to other possible
opinion targets. We argue that this may not usually
be true, as most opinionated documents have
sentences that express different opinions different
opinions even within the same paragraph (Pang and
Lee, 2008).
2.5 Frequency of Terms in
Language-Model Opinion Detection
The language model combines prediction of
occurrence for natural language words and then
shows a probabilistic interpretation of such
occurrences (Zhai, 2009). For example in Lv and
Zhai (2010), proximity heuristic that determine the
occurrence of query terms at a distance close to each
other was defined, hence determine the document
paragraph that has the highest occurrence of such
proximity information. Often, this technique requires
smoothing procedures and appropriate probability
density function. We argue that words in proximity
may not determine the context at which human seeks
opinion. In fact, it is not yet clear whether such
technique can be applied to faceted-opinion retrieval
whereby a single document is expected to generate
different opinions.
3 EFFECTIVE OPINION
RETRIEVAL
Effective opinion retrieval technique can solve the
problems identified above. Although, it will be
practically challenging to aim at solving all opinion
retrieval related problems in a single technique,
however, we believe an ideal retrieval system should
give sufficient relevance to human’s opinionated
information need (Pang and Lee, 2008). What makes
effective retrieval system is the ability to retrieve
opinionated information relevant to the context of
the query given, and at a lesser computational cost.
Relevant documents retrieved for the opinion target
must be relevant to human’s intent at a reasonable
degree. It should be noted that effective opinion
retrieval may not denote a perfect retrieval system.
For effective contextual relevance purpose, opinion
retrieval systems should be able to consider
underlying meaning of sentences within opinionated
documents. Thus, we argue that a basic retrieval
process must aim at reflecting the grammatical
context of opinionated information need and not the
frequency of query terms.
4 TOWARDS SENTENCE-LEVEL
CONTEXTUAL OPINION
RETRIEVAL
Towards providing effective solution to the
problems identified above, we propose a grammar-
based approach for sentence-level contextual
opinion retrieval. We believe sentences form the base
of the overall opinion being expressed in a document,
and opinionated information is better represented in
sentences than individual query words. (Pang and
Lee, 2008). Therefore, we consider effective opinion
retrieval technique that can retrieve sufficient
documents relevant to human’s opinionated
information need. From the series of problems
highlighted above, we could observe that the success
of any opinion retrieval technique would specifically
depend on the degree of relevance to human’s intent.
5 GRAMMAR-BASED
CONTEXTUAL OPINION
RETRIEVAL
We propose to understand the underlying meaning
of the given query and each sentence in a given
document. For this process, we propose to use CCG
which is a context-sensitive grammar and NLP
technique (Baldridge and Kruijff, 2004). With CCG,
we are able to know the underlying meaning and
dependencies between words within the given query
or the sentences within opinionated documents. The
accumulation of sentences that have the same
underlying meaning with the given query would
determine the contextual relevance of the document
to the intent of the human seeking the opinionated
documents. By this process, frequency of contextual
relevant sentences is used to determine the relevance
of opinionated documents instead of frequency of
individual query words.
6 LIMITING FACTORS
Knowing the fact that opinions must be detected to
match the intent of the human seeking the opinion is
only an initiation of idea that must be backed up
with practical implementations. However, there are
FREQUENCY OF SENTENTIAL CONTEXTS VS. FREQUENCY OF QUERY TERMS IN OPINION RETRIEVAL
609
few limiting factors that may pose major challenges
towards the implementation of the context-
dependent opinion retrieval approach. By identifying
these factors, more research opportunities are
created for opinion retrieval. Existing approaches
can be improved such that effective opinion retrieval
techniques can be achieved in the long-run.
6.1 Dependency Among
Opinionated Sentences
Each sentence in an opinionated document may on
its own represent certain degree of opinion.
However, some sentences depend on prior (i.e.
sentence before) or latter (i.e. sentence after)
sentences in order to capture adequate opinion being
expressed. Detecting context-dependent opinion at
sentence-level could as well include looking into
dependencies between sentences
(Bermingham and
Smeaton, 2009). Therefore, opinion retrieval
techniques could consider multi-dependency in
opinionated sentences.
6.2 Multi-lingual Analysis
Opinionated documents appear in different types of
languages (Bruce et al, 2009). Unfortunately, many
research works avoid non-English opinionated
documents (Kim et al, 2010), while few research
works performed bi-lingual analysis of opinions. For
example in blogs, contributors have different lingual
backgrounds (Bruce et al, 2009), which is why
detecting collective opinion without lingual
differences is still a general challenge. Therefore,
future research works should be aware of this
limitation and its significance to the overall opinion
retrieval task.
7 SUMMARY & FUTURE WORK
Challenges in state-of-the-art opinion retrieval
techniques were reviewed. The cause of major
challenges can be summarized as ineffective way of
detecting context-dependent opinion at a lesser
computational cost. Opinions are context-dependant
and a grammar-based opinion retrieval technique is
can solve the above mentioned problems. In this
paper, we proposed grammar-based approach for
detecting context-dependent opinion by using CCG.
This approach can be quite useful for faceted-
opinion retrieval and personalized web search. In
our future work, we plan to implement our sentence-
level contextual model for opinion retrieval task.
REFERENCES
Bermingham, A. and Smeaton, A.F. (2009). A study of
inter-annotator agreement for opinion retrieval. In
SIGIR'09: Proceedings of the 32nd international
conference on Research and development in
information retrieval, pages 784-785. ACM.
Bruce, E. et al., (2009) Mapping the Arabic blogosphere:
politics, culture and dissent. Berkman Center for
Internet and Society at Harvard University.
Ding, X., Liu, B. and Yu, P.S. (2008). A holistic lexicon-
based approach to opinion mining. In Proceedings of
the international conference on Web search and web
data mining. Pages 231-240.ACM.
Hannah, D., Macdonald, C., Peng, J., He, B., Ounis, I.
(2007). University of Glasgow at TREC 2007:
Experiments in Blog and Enterprise Tracks with
Terrier. In TREC, 2007.
Baldridge , J.M., Kruijiff, G-J.M. (2004). Course Notes on
Combinatory Categorial Grammar.
Kim, J., Li, J-J. and Lee, J-H. (2010). Evaluating
multilanguage-comparability of subjectivity analysis
systems. In ACL'10: Proceedings of the 48th Annual
Meeting of the Association for Computational
Linguistics, pages 595-603.
Krahmer, E. (2010) What Computational Linguists Can
Learn from Psychologists. Association for
Computational Linguistics, 36(2): 285-294.
Liu, B. (2010) Sentiment Analysis and Subjectivity.
Handbook of Natural Language Processing, Second
Edition.
Sarmento, L., Carvalho, P., Silva, J.M., Oliveira, E. (2009)
Automatic creation of a reference corpus for political
opinion mining in user-generated content. In CIKM
'09: Proceeding of the 1st international CIKM
workshop on Topic-sentiment analysis for mass
opinion, pages 29-36. ACM.
Lv, Y. and Zhai, C. (2009). Positional language models
for information retrieval. In SIGIR'09: Proceedings of
the 32nd international conference on Research and
development in information retrieval, pages 299-306.
ACM.
Pang, B. and Lee, L. (2008). Opinion Mining and
Sentiment Analysis. In Found. Trends Inf. Retr.,2(1-
2): 1-135.
Zhai,C. (Statistical Language Models for Information
Retrieval: A Critical Review. In Found. Trends Inf.
Retr., 2(3):137–213.
Gerani, S., Carman, M.J. Crestani, F. (2010). Proximity-
Based Opinion Retrieval. In SIGIR, page 978. ACM.
Siersdorfer, S., Chelaru, S. and Pedro, J.S. (2010). How
Useful are Your Comments? Analyzing and Predicting
YouTube Comments and Comment Ratings. In
International World Wide Web Conference, pages
891-900.
Wei, Z. and Clement, Y. (2006). UIC at TREC 2006 Blog
Track, In TREC, 2006.
WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies
610