Linguistic Alerts in Information Filtering Systems
Towards Technical Implementations of Cognitive Semantics
Radoslaw P. Katarzyniak
1
, Wojciech A. Lorkiewicz
1
and Ondrej Krejcar
2
1
Faculty of Computer Science and Management, Wroclaw University of Technology,
Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland
2
Center for Basic and Applied Research, Faculty of Informatics and Management, University of Hradec Kralove,
Rokitanskeho 62, 500 03 Hradec Kralove, Czech Republic
Keywords:
Information Filtering, Linguistic Alerts, Computational Semiotics, Epistemic Modalities, Cognitive Seman-
tics.
Abstract:
An original model of natural language alerts production is proposed. The alerts are produced by information
filtering system and stated in a quasi-natural language, both potentially written and vocalized. The alerts are
chosen with respect to a certain collection of uncertain decision rules, thus they inherit various levels of epis-
temic uncertainty. The quasi-natural language statements include linguistic operators of epistemic modality, as
their necessary parts. The proposed model implements in a technical context an adequate cognitive semantics
captured by an original theory of epistemic modality grounding defined elsewhere.
1 INTRODUCTION
Users’ selective dissemination of information and re-
lated information filtering (IF for short) are important
challenges for modern information systems (Hanani
et al., 2001). They seem particularly crucial for man-
agement executives, interested in and strongly depen-
dent on up-to date information related to their every-
day business activities (Xu et al., 2011). The way
how the selected (filtered) information is presented
needs to be designed with substantial influence of real
environments in which the executives work, includ-
ing these days frequent mobility of their daily work.
In such circumstances all easily comprehensible pre-
sentation modes, for instance applications of quasi-
natural, written and sometimes even vocalized lan-
guages, have become a very important theoretical and
practical problem for computer science community.
Unfortunately in actual settings, it is often the case
that on-line indexing of documents, incoming to exec-
utives’ knowledge repositories, is practically impossi-
ble due to their inherent characteristics. For instance,
a typical document can consists of expanded multi-
media elements and therefore require advanced and
time-consuming processing to elaborate semantic de-
scription of their content. Fortunately, at least in some
practical contexts, an approximate (yet still effective)
solution is to settle executive-oriented filtering solely
on attributes of incoming documents for which values
can be easily determined. Such attributes may include
origin, author(s), affiliating institutions, attached gen-
eral keywords, etc. However, a rather obvious incon-
venience of the approximate solution is that filtering
decisions may be uncertain to some extent. In par-
ticular, due to underlying soft classification rules in
which preconditions are defined by means of easy-
to-determine attributes, and post-conditions are built
from subjects (topics) the executives are interested in.
Another inconvenience might be that such IF systems
need to be based on processes of classification rules
management (namely, their effective extraction, stor-
age, retrieval and update).
In this paper we provide a theoretical background
for solving the highlighted problem for a particular
class of IF systems. Namely, a theoretical founda-
tion for production of incoming documents’ alerts,
founded on uncertain classification rules, is discussed.
An important functional assumption is that alerts are
to be stated in quasi-natural languages with linguis-
tic markers for communicating levels of epistemic
uncertainty. In perhaps all languages such linguistic
markers exist, usually in a form of well-known and
widely used basic modal operators of knowledge (I
know that ...), believe (I believe that ...) and possi-
bility (I find it possible/it is possible that ...), as well
as their possible extensions e.g. I strongly believe
512
Katarzyniak, R., Lorkiewicz, W. and Krejcar, O.
Linguistic Alerts in Information Filtering Systems - Towards Technical Implementations of Cognitive Semantics.
In Proceedings of the 18th International Conference on Enterprise Information Systems (ICEIS 2016) - Volume 1, pages 512-519
ISBN: 978-989-758-187-8
Copyright
c
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
that ... (Nuyts, 2001). The way, in which the natu-
ral language operators of epistemic modality should
be chosen and used as components of information fil-
tering alerts, is an original contribution of this work,
comparing to other works, usually dealing with other
classes of language vagueness e.g. (Herrera-Viedma
et al., 2004).
The overall organization of the presentation is as
follows. In section 2, our original model of knowl-
edge base and basic knowledge management pro-
cesses, underlying the extraction and application of
classification rules to alerts’ generation, is presented.
In this section a concept of mental language holons
is introduced as a key knowledge structure participat-
ing to adequate alert’s choice and extraction. Accord-
ing to their definition, the holons cover complemen-
tary language oriented experience summarizations. In
section 3, the so-called theory of modality grounding,
originally proposed elsewhere (Katarzyniak, 2005),
is applied to define strong and logically consistent
support for choosing adequate epistemic modality of
alerts. The theory is based on a technical model of so-
called cognitive semantics for natural language state-
ments, limited to some scope of quasi-natural lan-
guage statements, modal in the epistemic sense. The
section consists a brief note on the novelty of our ap-
proach, considered in respect to technical application
of cognitive linguistics. In section 4 a computational
example is presented. Finally, section 5 summarizes
presented results and points out possible future exten-
sions.
2 MODEL DEFINITION
2.1 Profile of User Information Needs
and Filtering Task
The system’s user
1
is focused on a given aspect (im-
portant to the user) of information stored within the
system. This relevant to the user context strictly de-
pends on current user’s preferences and is inherently
individual. This user’s focus represents a set of topics
of interest (document’s subjects or themes) that are of
special importance or significance to the user.
The user’s information needs are represented
by a set of subjects (also called themes) S =
{sub
1
, sub
2
, ..., sub
M
}, being of potential interest to
him or her. Moreover, we further assume that infor-
1
Due to strict editorial limitations, we focus solely on a
single user case. However, in a more practical realisation
the proposed approach can be easily extended to a case of
multiple users.
mation needs represent the sole information that the
system captures about the user. Consequently, user’s
information needs represent the user’s profile stored
within the system.
We further assume that all of the incoming docu-
ments are processed (filtered and indexed) in order to
determine whether they cover any of the highlighted
topics of interest (user’s information needs). The sole
purpose of the filtering stage is to identify documents
that are significant to the end user. In particular, the
goal is to identify documents d that having a complete
knowledge about them (for instance through thorough
manual examination by an expert) would lead to for-
mulation of basic statements ”d is about sub
j
or
”d is about sub
j
and sub
k
(where j is different from
k). Recognizing such documents should further lead
to generation of adequate alerts by the system, to in-
form the end user about the appearance of important
documents.
Seemingly the introduced form of user profile is
extremely trivial, as compared to apparently more
complex models of user profiles studied and applied
in the field of IF systems e.g. (Brown and Jones,
2001; Shapira et al., 1999; Xu et al., 2011). How-
ever, as it turns out in the context of presented ap-
proach to generation of linguistic alerts even in such
an oversimplified profile posses significant problems.
In particular, linguistic approach requires application
of unconventional linguistic semantical models that
represent solid and adequate theoretical background
for technical implementations of cognitive semantics
for such linguistic alerts.
Importantly, we should highlight that the afore-
mentioned task of document filtering is highly com-
plex. In automatic approaches it requires a com-
plex and computationally exhaustive procedure that
is able to analyse the content of a document and de-
termine it’s relevance against the set of identified sub-
jects. Moreover, in some realities a fully automatic
approach might not even be available, as such a set
of semi-automatic or even manual methods must be
utilised. Furthermore, in systems with strict process-
ing time restrictions the filtering process posses sig-
nificant technical problems, both methodologically
and computationally.
2.2 The Repository Databases
The repository consists of two classes of documents:
already stored documents D = {d
1
, d
2
, ..., d
K
} with
complete descriptions (including a description of their
semantic content) stored in a regular database of the
repository, and new documents (new arrivals) D
new
=
{d
new
1
, d
new
2
, ..., d
new
K
new
} with incomplete descriptions of
Linguistic Alerts in Information Filtering Systems - Towards Technical Implementations of Cognitive Semantics
513
their thematic content, thus awaiting off-line semantic
analysis.
Formally the repository sub-databases can be
described as an information system by Pawlak
(Pawlak and Skowron, 2007), tailored to our prac-
tical context. Let Rep = (D D
new
, A,V, ρ) be fur-
ther considered, where D and D
new
are sets of
stored documents and new arrivals, respectively, A =
{w
1
, w
2
, ..., w
L
, sub
1
, sub
2
, ..., sub
M
} is a set of at-
tributes, V =
S
aA
V
a
, where V
w
i
= W
i
and V
sub
i
=
{ε, 0, 1}, is a set of attributes’ values, and ρ : D × A
V is a partial information function.
The partiality of function ρ reflects the extent to
which documents are described, regarding their the-
matic content (their semantics). Namely, it is as-
sumed that W = {w
1
, w
2
, ..., w
L
} consists of mul-
tivalued attributes, called conditional ones. Val-
ues of conditional attributes are usually delivered
at document’s arrival, as the attributes represent a
set of easily computed parameters/characteristics of
the document (computed on-line). Contrary, S =
{sub
1
, sub
2
, ..., sub
M
} consists of attributes, called
thematic attributes, representing the content of doc-
uments (in respect to a given profile of information
needs). Determining the value of thematic attributes
requires intensive (both methodological and compu-
tational) off-line semantic analysis of the document.
For the sake of clarity and ease of presentation
some additional symbols are further introduced.
Namely, for each document d D D
new
, ρ
d|W
:
W
S
L
i=1
(W
i
) is a conditional-part information
function related to document d, such that for each at-
tribute x W , ρ
d|W
(x) W
x
holds, provided that W
x
consists of all possible values of x.
Similarly, for each document d D D
new
, ρ
d|S
:
S {ε, 0, 1} is a thematic-part information function
related to document d. However, in this case rules
for assigning attribute values differ for d D and
d D
new
. Namely for each attribute x S and each
document d D, ρ
d|S
(x) = 1 if and only if docu-
ment d is indexed as being about subject x. Otherwise
ρ
d|S
(x) = 0. At the same time, for each attribute x S
and d D
new
, the value of x is treated as unknown,
what is formally represented by ρ
d|S
(x) = ε.
2.3 Mental Language Holons as
Representation of Subject
Distribution
As aforementioned, the introduced IF system is dedi-
cated to analyse incoming documents, regarding in-
dividual subjects sub S or/and their conjunctions
sub
x
sub
y
, where sub
x
S, sub
y
S, sub
x
6= sub
y
.
Results from this analysis may be uncertain predic-
tions, communicated by the means of natural lan-
guage operators of epistemic modality.
Below an adequate model of database meta-
descriptions used in the filtering process is proposed.
It’s purpose is to enable effective and semantically
valid realization of the assumed functional IF sys-
tem’s goal. The model will be fully compatible with
an original theory of epistemic modality grounding,
partially presented in (Katarzyniak, 2005; Katarzy-
niak, 2006b; Katarzyniak, 2006a). The main assump-
tion of the theory is that linguistic alerts are insepa-
rably connected to (in a sense grounded in) so-called
mental language holons . Language holons represent
embedded summarization of empirical episodic ex-
periences, i.e., experiences strictly related to partic-
ular subjects or their binary conjunctions. In many
ways language holons are similar to mental models,
known from the cognitive linguistics and psychology
(Johnson-Laird, 1985). For the sake of complete-
ness it is worth mentioning that, at the technical level,
mental language holons can be treated as complexes
of complementary classification rules.
In order to formally capture the latter, the follow-
ing three retrieval languages are introduced:
K S = {sub
1
, sub
2
, ..., sub
M
},
K B = {sub
x
sub
y
| sub
x
, sub
y
S x < y},
K L = {
L
^
i=1
(w
i
= x
i
) | w
i
W, x
i
W
i
, i = 1..L}.
(1)
The semantics of retrieval languages is given by
following functions:
δ
|K S
: K S 2
D
,
δ
|K B
: K B 2
D
,
δ
|K L
: K L 2
DD
new
.
(2)
where:
δ
|K S
(sub) = {d D | ρ
d|S
(sub) = 1},
δ
|K B
(sub
x
sub
y
) = δ
|K S
(sub
x
) δ
|K S
(sub
y
),
δ
|K L
(
L
^
i=1
(w
i
= x
i
)) = {d D |
L
^
i=1
(ρ
d|W
(w
i
) = x
i
)}
(3)
Mental language holons are defined for simple
subjects in K S and conjunctive subjects in K B, in
respect to particular conditions from retrieval lan-
guage K L
+
K L, where the subset (of non-empty
conditions) K L
+
is defined as: K L
+
= {k K L |
δ
|K L
(k) D 6=
/
0}.
Having defined K L
+
, we can introduce two aux-
iliary symbols class K
i
and class extension EXT (K
i
).
In particular, a class K
i
defines a set of indistinguish-
able (conditional attribute-wise κ
i
) already process
ICEIS 2016 - 18th International Conference on Enterprise Information Systems
514
documents, whereas class extension EXT (K
i
) defines
a set of indistinguishable (conditional attribute-wise
κ
i
) all documents. Namely, if (and only if) |K L
+
| =
Q 1 and K L
+
= {κ
1
, κ
2
, ..., κ
Q
}, then for i = 1..Q,
K
i
= δ
|K L
(κ
i
) D,
EXT(K
i
) = δ
|K L
(κ
i
) D
new
.
(4)
For each sub S and κ
i
K L
+
, the (simple
subject) mental language holon is given as a vector
simholon:
simholon[κ
i
, sub, λ
+
A
(sub), λ
A
(sub)], (5)
where
λ
+
A
(sub) =
|δ
|K S
(sub) K
i
|
|K
i
|
,
λ
A
(sub) =
|(D \ δ
|K S
(sub)) K
i
|
|K
i
|
.
(6)
For each conjunctive subject (sub
x
sub
y
) =
sub
xy
K B and κ
i
K L
+
, the (conjunctive subject)
mental language holon is given as a vector conholon:
conholon[κ
i
, sub
xy
, λ
++
C
(sub
xy
), λ
+
C
(sub
xy
),
λ
+
C
(sub
xy
), λ
−−
C
(sub
xy
)],
(7)
where
λ
++
C
(sub
xy
) =
|δ
|D
(sub
x
) δ
|D
(sub
y
) K
i
|
|K
i
|
,
λ
+
C
(sub
xy
) =
|δ
|D
(sub
x
) (D \ δ
|D
(sub
y
)) K
i
|
|K
i
|
,
λ
+
C
(sub
xy
) =
|(D \ δ
|D
(sub
x
)) δ
|D
(sub
y
) K
i
|
|K
i
|
,
λ
−−
C
(sub
xy
) =
|(D \ δ
|D
(sub
x
)) (D \ δ
|D
(sub
y
)) K
i
|
|K
i
|
.
(8)
From the pragmatic point of view, mental lan-
guage holons are higher level summarizations (se-
mantic generalizations) of relative share of comple-
mentary bodies of experiences, related to particular
subjects (or their conjunctions). The whole repository
of language holons, available to IF system’s processes
and, in particularly to alerts production procedures, is
given as follows:
HOLONS =SIMHOLONS CONHOLONS,
SIMHOLONS ={simholon[κ, x, λ
+
A
(x), λ
A
(x)]
| κ K L
+
, x K S },
CONHOLONS ={conholon[κ, x, λ
++
C
(x), λ
+
C
(x), λ
+
C
(x),
λ
−−
C
(x)] | κ K L
+
, x K B}.
(9)
3 ALERTS PRODUCTION
3.1 Alerts and their Semantic
Proto-forms
Examples of possible structure and content of alerts,
considered in our research, are given as follows:
IF SYSTEM ALERT: There is a new [document: x].
I believe it is about [subject: sub]. You may be inter-
ested in reading it!
IF SYSTEM ALERT: Documents [documents:
x
1
, ..., x
k
] are new. It is possible that they are about
[subjects: sub
x
and sub
y
]. Should I put them on your
pending list?
IF SYSTEM ALERT: There is a new [document: x]
worth of being looked at. I believe it is about [subject:
sub
x
], but not about [subject: sub
y
]. According to
what I know about your interests, the first issue may
be of interest to you. Should I put the document to
your working box? Please, answer [YES/NO]!
IF SYSTEM ALERT: Among others, the follow-
ing documents: x
1
, ..., x
k
have been received from
[source: source], too. I believe they are not about
[subject:sub] which you pointed at as your main is-
sue. Whether, despite this shall I put them on your
pending list? Please, answer [YES/NO]!
IF SYSTEM ALERT: It is possible that the following
[incomings: x
1
, ..., x
L
] deal with [subject:sub], which
is on your list of interests. Are you interested in read-
ing them before turning them to our central document
base? Please, answer [YES/NO]!
The structure of alerts fully depends on designer’s
choice and, obviously, it should reflect favoured
modes and preferences of particular user (users’
group) interactions. In our case the alerts are repre-
sented (communicated) in a natural language, which
is a partially controlled version of actual language. In
advanced multimedia systems the alerts can be vocal-
ized, too.
The common feature of the above examples is
their underlying sense. Namely, regardless of their
form (individual document vs. group of documents,
simple subject vs. conjunctive subject), they all are
founded on the same propositional aspect: being
about or not being about a particular simple subject
(or conjunction of simple subjects). Moreover, For
x D D
new
and sub K S K B, each example is
Linguistic Alerts in Information Filtering Systems - Towards Technical Implementations of Cognitive Semantics
515
originally created as instantiation (concretization) of
one of the following basic linguistic proto-forms:
knowing([document(s):x] is about [subject(s):sub])
believing([document(s):x] is about [subject(s):sub])
possible([document(s):x] is about [subject(s):sub])
or another proto-form, complementary to the above
enumerated ones.
It is worth of mentioning that for a fixed document
x and a fixed subject sub (a simple subject or a bi-
nary conjunction of simple subjects) one and only one
proto-form should be instantiated as proper represen-
tation of epistemic state. Namely, such constraint fol-
lows from common sense, natural language pragmat-
ics rule, saying that knowing, believing and finding
something only as possible (in the epistemic sense)
are mutually exclusive, different states of the same
mental epistemic attitude. Thus, in our research an
adequate extraction of natural language alerts from
IF system’s knowledge base (or more strictly: proper
and adequate choice and further instantiation of proto-
form) becomes a fundamental issue to be elaborated,
on both technical and theoretical levels.
In conclusion, similarly to other natural language
statements, three aspects of alerts need to be taken
into account: propositional element, modality, and
temporal frame. As it has just been mentioned above,
the propositional element is given by predication,
which on written (or vocalized) level is referred to
by elements of sets K S and K B. The alerts’ tem-
poral dimension is quite apparent. Namely, they are
stated in the present grammatical time. A more prob-
lematic issue is the alerts’ modality choice, which in
our case should reflect a kind of epistemic uncertainty
of IF system, itself. An important question, of both
theoretical and technical nature, is how to properly
choose adequate modality markers, in order to ex-
tend written (or vocalized) representation of predica-
tion (applied to incoming documents). This question
is strongly supported by an original theory of ground-
ing of modal epistemic statements, briefly presented
below.
3.2 Applying the Theory of Epistemic
Modality Grounding to Alerts’
Production
The decision rules for proper choice of an adequate
modal proto-form, its instantiation (and further pre-
sentation to an end user in a written and/or vocalized
form) follow from an original theory of grounding,
presented elsewhere. Namely, for the case of sim-
ple subject-based predication the introductory theo-
retical results can be found in (Katarzyniak, 2005),
for binary conjunctive subject-based predication in
(Katarzyniak, 2006b; Katarzyniak, 2006a).
It is assumed in the theory (following multiple
models of language production (Evans and Green,
2006; Stachowiak, 2013; Wlodarczyk, 2013)) that
particular epistemic operators of modality are related
to summarized empirical experience, supporting re-
lated language proto-forms. However, these proto-
forms are never stored and processed as separate en-
tities, for they are conceptually (mentally) related to
their complementary counterparts. In particular, such
complexes of complementary proto-forms constitute
linguistic holons, which in our technical approach are
strongly related to the concept of mental language
holons, defined in the previous sections. In conse-
quence, to each linguistic proto-form, always related
to one and only one part of a relevant mental language
holon, certain intensity of summarized (embodied)
experience of a subject (or binary conjunctive subject)
is assigned. In the theory of grounding this intensity
is numerically represented by the relative grounding
strength.
According to the theory of simple modali-
ties grounding, the proper choice of adequate
linguistic proto-form is possible if and only
if a proper system of the so-called modality
thresholds is applied (and technically realized
in a system). In our case the system needs to
consist of two interrelated sub-systems of thresh-
olds {λ
KS
Know
, λ
KS
maxBel
, λ
KS
minBel
, λ
KS
maxPos
, λ
KS
minPos
} and
{λ
Know
, λ
maxBel
, λ
minBel
, λ
maxPos
, λ
minPos
}, for effec-
tive control of simple-subject predication instantia-
tion and conjunctive subject predication instantiation,
respectively.
An interesting result from the theory of ground-
ing, for the practice perhaps the most important one,
is that the system of modality thresholds cannot be
freely chosen. Namely, in order to guarantee common
sense consistency of (written and verbal) language be-
haviour the system of modality thresholds has to ful-
fil some predefined set of requirements, accepted in
the theory of grounding, as a reflection of common
sense pragmatics applied in actual contexts to natural
language operators of knowledge, belief, and possi-
bility. The fact that written and/or verbal behaviour,
produced by a technical system based on the theory
of grounding, is actually consistent, from the semi-
otic and pragmatic point of view, can be analytically
proved and verified
2
.
Moreover, within the numerical scope which is
permissible according to the theory of grounding, val-
ues for thresholds can be chosen in an arbitrary man-
ner (Katarzyniak, 2005). However, for the case of
2
Some of the results can be found in (Katarzyniak, 2005;
Katarzyniak, 2006b; Katarzyniak, 2006a).
ICEIS 2016 - 18th International Conference on Enterprise Information Systems
516
populations of artificial agents it is be possible to ob-
tain them from computationally realized processes of
artificial language semiosis (Lorkiewicz et al., 2011).
In order to omit deeper discussion of the theory
of grounding (outside of the scope of this work) we
further present an original application of the theory
to basic rules definition for modal alerts’ acceptabil-
ity and adequacy. The fundamental assumption is
that a given modal alert can be produced (by IF sys-
tem) if and only if its underlying linguistic proto-form
is well-grounded in IF system’s knowledge base. It
means, too, that in this practical context, for a certain
alert being well grounded is equivalent to adequately
describing a related IF system’s state of knowledge
about possibility of a certain document d D D
new
to deal with a certain subject sub K S K B. In
particular, for any document d D
, d EXT (K
i
),
and sub K S , the following set of so-called ground-
ing relations constitute the theoretical foundation of
IF alerting processes:
simholon[κ
i
, sub, λ
+
A
(sub), λ
A
(sub)]
|=
G
possible([d] is about [sub])
holds if and only if λ
KS
minPos
λ
+
A
(sub) < λ
KS
maxPos
.
simholon[κ
i
, sub, λ
+
A
(sub), λ
A
(sub)]
|=
G
believing([d] is about [sub])
holds if and only if λ
KS
minBel
λ
+
A
(sub) < λ
KS
maxBel
.
simholon[κ
i
, sub, λ
+
A
(sub), λ
A
(sub)]
|=
G
knowing([d] is about [sub])
holds if and only if λ
+
A
(sub) = λ
KS
Know
= 1.
Rather obviously, complementary alerts on doc-
ument d D
not being about a particular subject
sub K S , are produced with respect to the next three
definitions:
simholon[κ
i
, sub, λ
+
A
(sub), λ
A
(sub)]
|=
G
possible([d] is not about [sub])
holds if and only if λ
KS
minPos
λ
A
(sub) < λ
KS
maxPos
.
simholon[κ
i
, sub, λ
+
A
(sub), λ
A
(sub)]
|=
G
believing([d] is not about [sub])
holds if and only if λ
KS
minBel
λ
A
(sub) < λ
KS
maxBel
.
simholon[κ
i
, sub, λ
+
A
(sub), λ
A
(sub)]
|=
G
knowing([d] is not about [sub])
holds if and only if λ
A
(sub) = λ
KS
Know
= 1.
Obviously, similar set of definitions, for d D
,
d EXT (K
i
), and (sub
x
sub
y
) = sub
xy
K B, can
also be formulated and used, if needed. However, in
such case another mental language holons must be re-
ferred to:
conholon[κ
i
, sub
xy
, λ
++
C
(sub
xy
), λ
+
C
(sub
xy
),
λ
+
C
(sub
xy
), λ
−−
C
(sub
xy
)]
|=
G
possible([d] is about [sub
x
] and [sub
y
])
holds if and only if λ
minPos
λ
++
C
(sub) < λ
maxPos
.
conholon[κ
i
, sub
xy
, λ
++
C
(sub
xy
), λ
+
C
(sub
xy
),
λ
+
C
(sub
xy
), λ
−−
C
(sub
xy
)]
|=
G
believing([d] is about [sub
x
] and [sub
y
])
holds if and only if λ
minBel
λ
++
C
(sub) < λ
maxBel
.
conholon[κ
i
, sub
xy
, λ
++
C
(sub
xy
), λ
+
C
(sub
xy
),
λ
+
C
(sub
xy
), λ
−−
C
(sub
xy
)]
|=
G
knowing([d] is about [sub
x
] and [sub
y
])
holds if and only if λ
++
C
(sub
xy
) = λ
Know
= 1.
For purely editorial reasons, we do not deal with
the complementary conjunctive alerts, i.e., alerts on
new documents being about [sub
x
and not sub
y
], [not
sub
x
and sub
y
], [not sub
x
and not sub
y
]. It is quite ob-
vious that they have to be verified in a similar way,
but against values of λ
+
C
(sub
xy
), λ
+
C
(sub
xy
), and
λ
−−
C
(sub
xy
), respectively.
3.3 A Brief Note on Cognitive Semantics
The novelty of our approach to the generation of
quasi-natural language alerts falls outside of previ-
ous linguistic models. Namely, it is an original pro-
posal consistent with cognitive linguistics (Evans and
Green, 2006) and interactive linguistics (Wlodarczyk,
2013) paradigms. Both of them refer our work to the
concept of cognitive semantics (Talmy, 2000), which
describes the way a particular natural language sen-
tence embraces the pre-linguistic knowledge corpora
accessible to minds of a communicative agent. Ob-
viously, in our R&D context the communicating sub-
jects are IF systems.
Cognitive semantics is always characterised by
high specificity, because in each case it reflects prag-
matics and meaning of a very narrow class of linguis-
tic phenomena. In our model this specificity is appar-
ently visible in internally related and complex struc-
ture of mental language holons. A proposal of how
to realize the cognitive semantics of alerts in our IF
system should be treated as the most original contri-
bution of the model.
4 COMPUTATIONAL EXAMPLE
In this section we introduce a basic example that illus-
trates the entire process of generating linguistic alerts
in IF systems. For the sake of simplicity let us as-
sume an elementary information systems comprised
Linguistic Alerts in Information Filtering Systems - Towards Technical Implementations of Cognitive Semantics
517
of a document repository consisted of 10 processed
documents D = {d
1
, d
2
, ..., d
10
} and 3 new documents
D
new
= {d
11
, d
12
, d
13
} that are evaluated based on a
set of 4 conditional attributes W = {w
1
, w
2
, w
3
, w
4
}.
Further, let us assume that user’s information needs
are limited to two subjects S = {sub
1
, sub
2
}. Conse-
quently, the set of all attributes available in the system
is defined as A = {w
1
, w
2
, w
3
, w
4
, sub
1
, sub
2
}. Fur-
thermore, let the domains of the introduced attributes
be given as follows, W
1
= W
2
= W
3
= W
4
= {A, B,C}.
Documents stored in the document repository are
processed. In particular, each document is analysed
by a set of indexing mechanisms (or other process-
ing mechanisms) that are able, based on the document
content and structure, to assign values for each of
the conditional attributes. Further, information about
each document’s subject is determined and stored. As
such the information function of the repository is de-
termined, i.e., attribute–value mapping, as given in
Table.1.
Focusing on three simple classes κ
1
, κ
2
, and
κ
3
, given as κ
1
= {(w
1
, B), (w
2
, A), (w
3
, A), (w
4
, A)},
κ
2
= {(w
1
,C), (w
2
,C), (w
3
, A), (w
4
, B)}, and κ
3
=
{(w
1
, B), (w
2
, B), (w
3
,C), (w
4
, A)}, we can deter-
mine three non-empty clusters of documents K
1
=
{d
1
, d
2
, d
3
, d
6
}, K
2
= {d
4
, d
5
, d
7
, d
10
}, K
3
= {d
8
, d
9
}
and their extensions EXT (K
1
) = {d
11
}, EXT (K
2
) =
{d
12
}, EXT (K
3
) =
/
0. It must to be mentioned that
one of the newly received documents, namely d
13
,
does not belong to any of these sets. This fact will
be commented in the final remarks section.
Resulting summarization of data is represented
by the following set of holons HOLONS =
SIMHOLONS CONHOLONS:
SIMHOLONS = {
simholon[κ
1
, sub
1
, 0.25, 0.75],
simholon[κ
1
, sub
2
, 1.00, 0.00],
simholon[κ
2
, sub
1
, 1.00, 0.00],
simholon[κ
2
, sub
2
, 0.25, 0.75],
simholon[κ
3
, sub
1
, 0.50, 0.50],
simholon[κ
3
, sub
2
, 0.00, 1.00]}.
(10)
CONHOLONS = {
conholon[κ
1
, sub
1
sub
2
, 0.25, 0.50,0.25, 0.00],
conholon[κ
2
, sub
1
sub
2
, 0.25, 0.75,0.00, 0.00],
conholon[κ
3
, sub
1
sub
2
, 0.00, 0.50,0.00, 0.50]}.
(11)
Having the relative grounding strength computed
and stored in each holon, we can now determine all
proto-forms, for the new arrivals from non-empty ex-
tensions EXT (K
1
) and EXT (K
2
).
To give an example, simple subjects will be con-
sidered. Let modality thresholds be set up to follow-
ing values λ
KS
Know
= λ
KS
maxBel
= 1, λ
KS
minBel
= λ
KS
maxPos
=
Table 1: Processed repository of documents.
w
1
w
2
w
3
w
4
s
1
s
2
d
1
B A A A 1 1
d
2
B A A A 0 1
d
3
B A A A 0 1
d
4
C C A B 1 0
d
5
C C A B 1 0
d
6
B A A A 0 1
d
7
C C A B 1 0
d
8
B B C A 0 0
d
9
B B C A 1 0
d
10
C C A B 1 1
d
11
B A A A ε ε
d
12
C C A B ε ε
d
13
B B C B ε ε
0.60, and λ
KS
minPos
= 0.20. These values are not acci-
dental. Namely, they have been chosen taking into
account theorems from the theory of grounding sim-
ple modalities (Katarzyniak, 2005). It follows that
the threshold values should preserve consistency of
sets of grounded proto-forms with common sense in-
terpretation. Below we provide examples of well-
grounded grounded proto-forms:
possible([d
11
] is about [sub
1
]) AND possible([d
11
] is
not about [sub
1
])
believing([d
11
] is about [sub
2
]), BUT STILL
possible([d
11
] is not about [sub
2
])
knowing([d
12
] is about [sub
2
])
It is worth of mentioning that these proto-forms
are logically consistent, which is ensured by the
proper choice of modality thresholds. A possible nat-
ural language alert founded on the established proto-
forms is:
IF SYSTEM ALERT: There is a new [document:
doc
12
] available. I believe it is about [subject: sub
2
].
You may be interested in reading it!.
5 FINAL REMARKS
The theoretical foundation for designing and imple-
menting interactive IF systems is proposed in this
paper. The desirable common sense consistency of
quasi-natural language alerts is ensured by the appli-
cation of a theory of epistemic modality grounding,
introduced elsewhere. The proposal substantially dif-
fers from previous models of similar alerts generation.
The proposed model of linguistic alerts choice and
production is supported by a simple computational
methodology and a naive model of uncertain clas-
sification rules. Alternative and more sophisticated
approaches are possible (and required) e.g. for the
way sets K
i
and EXT (K
i
) are determined. Obviously,
ICEIS 2016 - 18th International Conference on Enterprise Information Systems
518
complete final implementation need to cover the miss-
ing case of document d
13
, either.
The introduced model supports effective design and
implementation of modern interactive and mobile
tools for alerting end users about newly received ob-
jects of potential interests, in both written and vocal-
ized modal natural languages.
ACKNOWLEDGEMENTS
This work was realized under research cooperation
between Wrocław University of Technology (Fac-
ulty of Computer Science and Management - Internal
Grant No. S50198 K0803 ) and Hradec Kr
´
alov
´
e Uni-
versity (Center for Basic and Applied Research, Fac-
ulty of Informatics and Management SP-FIM-2016 -
Smart Solutions for Ubiquitous Computing Environ-
ments).
REFERENCES
Brown, P. J. and Jones, G. J. F. (2001). Context-aware re-
trieval: Exploring a new environment for information
retrieval and information filtering. Personal and Ubiq-
uitous Computing, 5(4):253–263.
Evans, V. and Green, M. (2006). Cognitive linguistics: An
introduction. Edinburgh University Press.
Hanani, U., Shapira, B., and Shoval, P. (2001). Informa-
tion filtering: Overview of issues, research and sys-
tems. User Modelling and User-Adapted Interaction,
11(3):203–259.
Herrera-Viedma, E., Herrera, F., Mart
´
ınez, L., Herrera,
J. C., and L
´
opez, A. G. (2004). Incorporating filter-
ing techniques in a fuzzy linguistic multi-agent model
for information gathering on the web. Fuzzy Sets and
Systems, 148(1):61–83.
Johnson-Laird, P. N. (1985). Towards a Cognitive Science
of Language, Inference, and Consciousness. Cam-
bridge University Press, Cambridge.
Katarzyniak, R. (2005). On some properties of grounding
simple modalities. Systems Science, 31(3):59–86.
Katarzyniak, R. (2006a). On some properties of ground-
ing nonuniform sets of modal conjunctions. Int. Jour-
nal of Applied Mathematics and Computer Science,
16(3):399–412.
Katarzyniak, R. (2006b). On some properties of grounding
uniform sets of modal conjunctions. Journal of Intel-
ligent and Fuzzy Systems, 17(3):209–218.
Lorkiewicz, W., Popek, G., Katarzyniak, R., and Kowal-
czyk, R. (2011). Aligning Simple Modalities in Multi-
agent System. In Proc. ICCCI 2011, volume 6923,
pages 70–79. LNAI.
Nuyts, J. (2001). Epistemic Modality, Language, and Con-
ceptualization: A Cognitive-pragmatic Perspective.
Pawlak, Z. and Skowron, A. (2007). Rudiments of rough
sets. Information Sciences, 177(1):3–27.
Shapira, B., Shoval, P., and Hanani, U. (1999). Experimen-
tation with an information filtering system that com-
bines cognitive and sociological filtering integrated
with user stereotypes. Decision Support Systems,
27:5–24.
Stachowiak, F. J. (2013). Tracing the role of memory and
attention for the meta-informative validation of utter-
ances. In Wodarczyk, A. and Wodarczyk, H., editors,
Meta-informative Centering in Utterances: Between
Semantics and Pragmatics, pages 121–142. John Ben-
jamins Publishing Co., Amsterdam.
Talmy, L. (2000). Toward a Cognitive Semantics. MIT
Press, Cambridge, MA.
Wlodarczyk, A. (2013). Grounding of the meta-informative
status of utterances. In Wlodarczyk, A. W. and H., ed-
itors, Meta-informative Centering in Utterances: Be-
tween Semantics and Pragmatics, pages 41–58. John
Benjamins Publishing Co., Amsterdam.
Xu, M., Ong, V., Duan, Y., and Mathews, B. (2011).
Intelligent agent systems for executive information
scanning, filtering and interpretation: Perceptions and
challenges. Information Processing and Management,
47(2):186–201.
Linguistic Alerts in Information Filtering Systems - Towards Technical Implementations of Cognitive Semantics
519