validation agent is to remove any dead links i.e., 
JANs that don’t exist anymore and ensure that the 
database is in a consistent and complete state. 
This paper is organized into six parts. A 
background introduction is given in Section 2 to 
illustrate our motivation for developing the Vijjana 
system. Section 3 briefly introduces the framework 
of our system. From it we can tell the essence of 
Markup and Validation agent which crosses two 
agents and play an important role in maintaining 
information consistence. Detailed agent work 
process and principle can be found in Section 4. The 
first part in this section explains a popular key 
phrases algorithm and its application in Vijjana. And 
followed is the whole work process. Section 5 gives 
out A concise conclusion and some future works.  
2  MOTIVATION 
AND RELATED WORK 
 The main difficulty of using the Web as a 
knowledge source lies in the fact that the Web is 
nothing more than a list of hyper-linked pages where 
the links have no associated semantics. Research on 
semantic webs is aimed at mitigating this difficulty. 
Tiwana et al. (2001) and Knoblock et al. (1997) 
discuss a uniform way to represent web resources 
and suggest models for automatic integration.  Work 
at IBM on the SHER project (Dolby et al., 2007; 
Fokoue et al., 2007) focuses on simplifying 
ontologies and scalable semantic retrieval through 
summarization and refinement. There has been also 
considerable research in the Artificial Intelligence 
community on formalizing knowledge 
representation (Sowa, 2000; Minsky, 1968; Sowa 
and Majumdar, 2003) which is being adopted by the 
researchers in the semantic web community 
(http://www.semanticweb.org). All of these efforts 
rely in one form or other on the ability to discover 
semantic links automatically by analyzing the 
contents of web pages, which poses considerable 
difficulties due to the ad hoc nature of web pages. 
While automatically converting the current web into 
a fully linked semantic web may be a solution, such 
an outcome is unlikely in the near future. 
Meanwhile a number of organizations such as 
del.icio.us (http://del.icio.us), Webbrain.com 
(http://webbrain.com), digg.com (http://digg.com) 
are busy creating what are called social networking 
sites where a person searching the web may come 
across an interesting link that is then “marked” with 
a set of tags (keywords) which are stored in the site 
owner’s server. A recent start-up company – 
RadialNetworks has developed a system called 
Twain (http://twain.com), which claims to create a 
semantic network automatically. While this may be 
an advantage for casual social networking it will be 
unsuitable for enterprise-wide knowledge networks 
as there are well-established relationships between 
document types specific to that organization or 
domain which cannot be derived automatically. 
The information created via these sites may be 
kept private, or it may be combined with similar lists 
created by other people – thus the name social 
network. In due course these lists may grow 
enormously needing the employment of a search 
engine bringing us back to the original problem - 
how to cope with a large number of links that cannot 
be visualized in their semantic context. Current 
social bookmarking sites do not have any semantic 
linking of web pages. For a knowledge network to 
be useful for a large community of users working in 
a well-defined domain (e.g. Computer Science 
Teaching), the semantic web should be buildable co-
operatively using a predefined taxonomy and link 
semantics. 
With this motivation, we propose a model we 
call Vijjana (a Sanskrit word that represents 
collective knowledge created through classification 
and analysis) which can help in organizing 
individually discovered web pages drawn from a 
narrowly bounded domain into a knowledge 
network. This can be visualized as a hyper tree 
(http://InXight.com), or a radial graph 
(http://iv.slis.indiana.edu/sw/), thus making the 
semantic relations visible. The visibility of semantic 
relationships is the key to comprehending what is 
actually inside the knowledge network.  It can be 
perused and also searched by anybody who wants to 
“discover” knowledge in that domain.  Let us 
consider a simple example where two professors 
Smith and Bradley among others can contribute 
useful links to web pages (we call them Jans which 
has roughly the same meaning as the word knol 
popularized by Google to represent units of 
knowledge) such as syllabi, homework problems, 
etc. to an evolving Computer Science specific 
knowledge network, say Vijjana-CS. These Jans are 
then classified and interlinked using a pre-defined 
taxonomy and relational semantics. This Vijjana-CS 
will grow organically as contributions continue. We 
can also define a number of agents associated with 
this model, which can keep the knowledge network 
complete and consistent by removing missing Jans 
and associated links. In addition, we can create a 
WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies
264