A WEB LIKELY-WORD INSTANT ORGANIZER (WEBLIO)

Dynamic Hints During Knowledge Collectors Move Mouse Over a Sentence

Po-Hsun Cheng

Software Engineering Department, National Kaohsiung Normal University

62, Shenjhong Rd., Kaohsiung, 82444, Taiwan

Ying-Pei Chen, Mei-Ju Su

Graduate Institute of Electronic Engineering, National Taiwan University

1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan

Keywords:

Data management, Information extraction, Intelligent data, Semantic web.

Abstract:

The more complicated web resources exist, the more professional web browsing technologies should be inno-

vated. This paper illustrates a concept for how to extract a web page semantic content and automatically follow

the cursor location to organize the likely-words from a sentence for data intelligence. Such a web browsing

concept could be implemented with a couple of cross-browser techniques. We believe this concept will be

popular with any other miscellaneous form in the future browsers. However, this concept will be another im-

portant step for human-computer interaction, especially, for minimizing the time expense and maintaining the

likely keywords library during further web surﬁng utilization.

1 INTRODUCTION

When the Mosaic browser was announced in 1993,

the web resources blasted in fashionableness. A sta-

tistical web site distributesthe Internet usage and pop-

ulation statistics that essences out the usage growth

rate is 205.5% from 2000 to 2008 in the world (In-

ternetWorldStats.com, 2008). Those facts can imply

that users surf in the sea of the webs and try to collect

practical knowledge for further employment. Never-

theless, we conceive most users will lose their knowl-

edge searching direction after several browsing waves

in the web sea. Consequently, it will be a problem for

how to instantly organize knowledge searching routes

from a speciﬁc implication sentence during the web

browsing stage.

Likewise, in 2003, the Web 2.0 terminology pro-

posed by D. Dougherty to illustrate the trend which

there was an apparent alteration in how people and

businesses were utilizing the web and elevating web-

based applications. That is to say, the web infor-

mation still blows up. Then researchers found that

service-centric systems are a prominent multidisci-

plinary paradigm concerned with software that are

constructed as compositions of self-governing ser-

vices (Nano and Zisman, 2007). For the moment, Vit-

var said when no single service can gratify the entire

goal, the composition task tries to create a plan for

that goal (Vitvar et al., 2007). Therefore, we might

follow the Ding’s comments to assist users and soft-

ware agents ﬁnd relevant knowledge on the semantic

web and examines the ontologies and facts that are en-

coded in semantic web documents (Ding et al., 2005).

Although the hindrances for supplying such an

service might be moderately unyielding, nevertheless,

Goth expressed that the obstacles in integrating man-

agement data across various boundaries demonstrates

a mother lode of opportunity (Goth, 2007). Mean-

while, Oren recommended that we can manipulate re-

sources depending on the source’s data access permis-

sions and capabilities (Oren et al., 2007). Accord-

ingly, we believe these suggestions will be upsurge

the accomplishment opportunity for solving afﬁliated

problems and also intensify our certainty.

Moreover, some researchers believe that extend-

ing service oriented architectures with semantics can

assist to invent service centric information systems

that better conform to transformations throughout

software systems’ lifetime and emerging applications

exploit the capability of a new breed of semantic tech-

nologies (Vitvar et al., 2007; Hendler, 2008). One of

them also predicted that emerging Web 3.0 corpora-

tions are combining the web data resources, standard

languages, ever-better tools, and ontologies into ap-

435

Cheng P., Chen Y. and Su M.

A WEB LIKELY-WORD INSTANT ORGANIZER (WEBLIO) - Dynamic Hints During Knowledge Collectors Move Mouse Over a Sentence.

DOI: 10.5220/0001843604350438

In Proceedings of the Fifth International Conference on Web Information Systems and Technologies (WEBIST 2009), page

ISBN: 978-989-8111-81-4

plications that take advantage of the powerof this new

species of semantic technologies (Hendler, 2008). For

instance, Missikoff has built a software environment

that supports the construction and assessment of a

domain ontology for intelligent information integra-

tion within a virtual user community (Missikoff et al.,

2002).

Primarily, the further complicated web resources

exist, the more experienced web browsing through

technologies would be innovated. Established on the

preceding listed problems, we proposed this paper

to depict a web likely-word instant organizer (We-

bLio) for transforming and probing user’s view of in-

tentions. That is, we compose related web search-

ing keywords instantaneously in order to reduce the

searching time for the Internet users, particularly for

students and other knowledge collectors. This pro-

posed methodology declares an intelligent keyword

list which relates to present cursor location, extracts

the corresponding sentence from a web page, parses

potential keywords, and wraps with Google search

commands.

2 METHODOLOGY

The following paragraphs illustrate our WebLio

methodology for instantly constructing a clever key-

word list from an active web page. We also draw a

ﬁgure to allude to such a successive step in Fig. 1. It

comprises at least ﬁve principal processes: determine

the cursor location in a web page, obtain a sentence

from the cursor location, parse sentence and compose

keywords, map keywords to Google search syntax,

and then display tips-on-demand at cursor location.

Fundamentally, our WebLio based on the Google

Web Toolkit (GWT) to implement related functional-

ities in order to avoid writing web applications within

an error-prone process. In addition, such an emerg-

ing process will decrease the difﬁculties for building,

reusing, and maintaining large JavaScript code bases

and AJAX components.

2.1 Get Cursor Location

Preliminary of all, our methodology attempts to ob-

tain cursor location from an active web page. The

cursor location might be accompanying with mouse

moving, pointer moving, or keyboard typewriting. In

addition, we only intellect the user mouse moving

process during browsing through the web pages. Es-

sentially, the web browser system will be feedback

mouse location with two coordination position. The

World Wide Web Consortium (W3C) proposed a Sim-

Figure 1: The web likely-word instant organizer (WebLio)

construction methodology.

ple API for XML (SAX) standard, and we emerge the

event-based StAX API to obtain the mouse position.

Streaming API for XML (StAX) is an API that en-

ables you to read and write XML documents in Java.

StAX is a parser independent, pure Java API founded

on interfaces that can be implemented by multiple

parsers. StAX was introduced in Java 6.0 and is delib-

erated superior to SAX and Document Object Model

(DOM).

2.2 Obtain Web Sentence

That is, we can invoke methods such as getName()

and getText() on the XMLStreamReader to retrieve

information about the item where the cursor is cur-

rently positioned. The interface XMLStreamReader

represents a cursor that is moved across an XML doc-

ument from beginning to end. At any given time, the

cursor always moves forward and usually only moves

one item at a time.

On the other hand, the tree-based DOM technol-

ogy from the W3C could let us access to all the ele-

ments on a web page, so we could refer to the cursor

location and use DOM technology to locate and ob-

tain the speciﬁc sentence which is pointed by the cur-

sor in a speciﬁc web page. The WebLio ﬁnds and

highlights the sentence element using getElement-

ById() and className().

WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies

436

2.3 Parse Speciﬁc Sentence

After we obtain the speciﬁc sentence from an active

web page, we utilize a self-deﬁned parser to parse it

and create an array of tokens which are extracted from

a sentence. Most of the tokens in this array are sin-

gle vocabulary at ﬁrst, then we try to concatenate the

preceding tokens and the following tokens into other

keyword phrases. Such a task will be executed and

also append keyword phrases at the end of the array.

Our WebLio adopts the link grammar parser ap-

plication programming interface (API) (Sleater and

Temperley, 1993) to handle the sentence. This im-

plicit processing will create verbs and nouns in an

English sentence and then deposit these tokens into

a temporarily tiny database for next processing step.

Undoubtedly, this natural language processing is a

tough task and the complete success parsing possibil-

ity is depended on the parsing algorithm. Anyhow, we

choose this popular open source grammar parsing al-

gorithm as our sentence parsing basis. The advantage

is that we can develop another interface to dynamic

connect with different grammar parsing API’s, if there

are diverse grammar parsers in the world. Meanwhile,

the international language switching will be more ef-

ﬁcient.

In order to temporarily deposit the keywords in

an array for processing, we utilize an open source

slight database, SQLite v3.6.4, to take care of such

an array. Basically, the SQLite database is an

in-process software library that implements a self-

included, server less, zero-conﬁguration, and transac-

tional SQL database engine. The SQLite is a highly

deployed SQL database engine in the world and the

source code is also in the public domain. Accompa-

nied by all features qualiﬁed, the library size can be

not so much than 250KB, relying on compiler opti-

mization environments.

2.4 Likely-word Mapping

Subsequently, we use the array of keywords which

is created at the preceding step to practice the key-

word mapping task. That is, we endeavor to map

our keywords one by one to Google search com-

mand. For instance, if there is a keyword ’human’,

then we will create a uniform resource locater (URL),

http://www.google.com/search?q=human, to refer to

the Google search command. Correspondingly, we

as well as can conﬁgure more detail search command

for Google searches with the locale deﬁnition of Mi-

crosoft Internet Explorer and get the same or similar

results form Google search engines.

2.5 Hints On-demand

At last, we attempt to beneﬁt a notation container to

accompany with the right mouse click to display the

afﬁliated instant keyword list. If some of the knowl-

edge collectors want to instant capture all of the pos-

sible likely words during mouse moving over a sen-

tence, the system can be set for such an instant mode

rather the host mode. No matter that the instant mode

will detect the mouse moving and capture the possible

sentence after mouse is located beyond one sentence

for a second.

Generally, the host mode for our system will ob-

tain higher performance than instant mode. After ex-

ecuting either mode, all of the keywords in the instant

catalog can immediately open another web browser

and pass the related URL. Extraordinarily, you might

desire to open all of the keywords in the array list, and

we as well supply such a workability to open all of the

explorations which are related all of the keywords in-

stantly by grouping.

3 RESULTS AND DISCUSSION

This document depicts a thought for how to extract

a semantic web content and automatically follow the

cursor position to systematize the keywords from a

sentence. Such a web browsing concept will be car-

ried out with a few cross-browser techniques. We

conceive this concept will be favorite with any other

miscellaneous ﬁtness in the future browsers. Never-

theless, this concept will be another prominent stride

for human-computer fundamental interaction, espe-

cially, for deprecating the time spending during web

surﬁng.

3.1 Not a Keyword Tool External

Practically, some users might deliberate for our

methodology will generate a list of keywords which is

similar to some of the Keyword Tool External (KTE)

tools, such as the SEOTools from SEOBook Co. The

KTE is a keyword insinuation tool which was show-

ing speciﬁc numbers for search terms instead of just

color bars. Anyway, our instant web keyword gener-

ator is not belonging to the KTE tools, and it is an-

other kind of the keyword generator which will sup-

ply the Internet users to smooth shift their focus to

another search keyword which is parsed and acquired

from a sentence and is located subordinate to the cur-

sor pointer.

A WEB LIKELY-WORD INSTANT ORGANIZER (WEBLIO) - Dynamic Hints During Knowledge Collectors Move

Mouse Over a Sentence

437

3.2 Security Consideration

Nonetheless, there is no important privacythoughtful-

ness for us to think about it. That is, it is not so privacy

for users to keep conﬁdential from the other Internet

users, in spite of; we beneﬁt the SQLite database to

afﬁrm the keywords in the client side. We believe

such a utilization of SQLite might be as well satis-

factory for the majority of the Internet users.

3.3 Performance Consideration

The more hardware facilities are publicized, the

higher performance for web browsing through pro-

cess will be procured. This is a straightforward rea-

soning procedure that the web browsing operation

will not be an inconvenience in the hereafter web

browser with client-side high-speed hardware plat-

form. Considering that our branch of philosophy

only processes the text-based processing, it will use

slighter resources than the other multimedia system.

For that reason, our methodology approximately has

no coincidence to become the dealing with perfor-

mance bottleneck during the web browsing stage.

On the other hand, someone might comment that

the GWT will generate bloated codes with poor per-

formance and implicit information. However, we uti-

lize the GWT as our platform basis to fast develop

out. All of the toolkits, even programming languages,

have their own disadvantages, somehow, it is compul-

sory for us to make use of the GWT to construct our

system and avoid potential known bugs.

3.4 Dictionary Binding

It is further conﬁdent for the Internet users to bind our

scientiﬁc method with some open-sourced dictionary

gadgets. However, we did not originate such a chore

and envisage that we will embrace such a functional-

ity inside our methodology. Furthermore, some of the

popular dictionary-oriented functions will be supple-

mented with no harm.

4 CONCLUSIONS

The further intricate web resources exist, the addition-

ally professional web browsing technologies should

be initiated. This paper illustrates a concept for how

to extract a web page semantic contentedness and

automatically follow the cursor position to organize

the likely-words from a sentence for data cleverness.

Such a web browsing concept could be carried out

with a few cross-browser techniques. We conceive

this conception will be popular with any other di-

versiﬁed form in the future browsers. However, this

thought will be another important step for human-

computer interaction, particularly, for minimizing the

time expense and keeping the likely keywords for the

time of web surﬁng.

ACKNOWLEDGEMENTS

The author would like to acknowledge all of the at-

tending students, Chi-Heng Chung, Jun-Shen Chen,

Wen-Chen Jian, et al., for their collaborative sugges-

tions in the Software Engineering Department, Na-

tional Kaohsiung Normal University from September

2008 to January 2009.

REFERENCES

Ding, L., Finin, T., Joshi, A., Peng, Y., Pan, R., and Reddi-

vari, P. (2005). Search on the semantic web. In Com-

puter, vol. 38, no. 10, pp. 62-69. IEEE.

Goth, G. (2007). Will the semantic web quietly revolution-

ize software engineering. In IEEE Software, vol. 27,

no. 4, pp. 100-105. IEEE.

Hendler, J. (2008). Web 3.0: Chicken farms on the semantic

web. In Computer, vol. 41, no. 1, pp. 17-19. IEEE.

InternetWorldStats.com (2008). The Internet World Stats:

Usage and Population Statistics. Miniwatts Marketing

Group, http://www.internetworldstats.com/stats.htm.

Missikoff, M., Navigli, R., and Velardi, P. (2002). Inte-

grated approach to web ontology learning and engi-

neering. In Computer, vol. 35, no. 11, pp. 60-63. IEEE.

Nano, O. and Zisman, A. (2007). Realizing service-centric

software systems. In IEEE Software, vol. 27, no. 6,

pp. 28-30. IEEE.

Oren, E., Haller, A., Hauswirth, M., Heitmann, B., Decker,

S., and Mesnage, C. (2007). A ﬂexible integration

framework for semantic web 2.0 applications. In IEEE

Software, vol. 27, no. 5, pp. 64-71. IEEE.

Sleater, D. D. and Temperley, D. (1993). Parsing english

with a link grammar. In Proc. of the thrid Interna-

tional Workshop on Parsing Technologies, pp. 1-14.

Vitvar, T., Zaremba, M., Moran, M., Zaremba, M., and

Fensel, D. (2007). Sesa: Emerging technology for

service-centric environments. In IEEE Software, vol.

27, no. 6, pp. 56-67. IEEE.

WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies

438