Formal Concept Analysis Applied to

Professional Social Networks Analysis

Paula R. C. Silva

, S´ergio M. Dias

1,2

, Wladmir C. Brand˜ao

, Mark A. Song

and Luis E. Z´arate

Informatics Institute, Pontiﬁcal Catholic University of Minas Gerais, Belo Horizonte, Brazil

Federal Service of Data Processing, Belo Horizonte, Brazil

Keywords:

Formal Concept Analysis, Proper Implications, Social Networks.

Abstract:

From the recent proliferation of online social networks, a set of speciﬁc type of social network is attracting

more and more interest from people all around the world. It is professional social networks, where the users’

interest is oriented to business. The behavior analysis of this type of user can generate knowledge about

competences that people have been developed in their professional career. In this scenario, and considering

the available amount of information in professional social networks, it has been fundamental the adoption of

effective computational methods to analyze these networks. The formal concept analysis (FCA) has been a

effective technique to social network analysis (SNA), because it allows identify conceptual structures in data

sets, through conceptual lattice and implication rules. Particularly, a speciﬁc set of implications rules, know

as proper implications, can represent the minimum set of conditions to reach a speciﬁc goal. In this work,

we proposed a FCA-based approach to identify relations among professional competences through proper

implications. The experimental results, with professional proﬁles from LinkedIn and proper implications

extracted from PropIm algorithm, shows the minimum sets of skills that is necessary to reach job positions.

1 INTRODUCTION

In increasingly interconnected world, the social net-

works attend different people’s interests and address

the communication and information needs of several

user groups (Russell, 2013). In particular, there are

professional social networks focused on a speciﬁc

group of users interest is oriented to business. One of

the largest and most popular professional social net-

work is LinkedIn, which has more than 400 millions

users distributed in more than 200 countries and terri-

tories (LinkedIn, 2016).

LinkedIn users create professional proﬁles, they

available their professional skills, competences and

experience. Thus, LinkedIn provides a source of pro-

fessional information that can be exploited by enter-

prise managers in different ways, e.g., to ﬁnd people

with appropriate competences to fulﬁll speciﬁc po-

sitions. In addition, the size and diversity of user-

generated content create an opportunity to identify

behavioral trends and user communities. In this sce-

nario, the formal concept analysis (FCA) presents it-

self as a mathematical theory that can be used for this

purpose.

FCA presents a mathematical formulation for data

analysis, which identify conceptual structures from a

data set (Ganter et al., 2005). It also presents an in-

teresting uniﬁed framework to identify dependencies

among data, by understanding and computing them in

a formal way (Codocedo et al., 2016). It is a branch

of lattice theory motivated by the need for a clear for-

malization of the notions of concept hierarchy.

There are two ways to extract and represent

knowledge from FCA: conceptual lattice and impli-

cation rules. In this work, we applied a particularly

type of implication rules, know as proper implica-

tions (Taouil and Bastide, 2001). We say that a propo-

sition P logically implies a proposition Q (P → Q), if

Q is true whenever P is true. The set of proper im-

plications have a minimal left-hand side and only one

item in right-hand side (Taouil and Bastide, 2001). It

has been used when the need is to ﬁnd the minimum

conditionsto lead a goal. In this article, theproper im-

plications represent the set of minimum professional’s

skills (conditions) for achieve a job position (goal).

For example, the proper implication {statistic, ma-

chine learning, databases} → {data scientist} rep-

resent a minimum set of skills which are necessary to

be a data scientist.

Several authors have been applied FCA to address

Silva, P., Dias, S., Brandão, W., Song, M. and Zárate, L.

Formal Concept Analysis Applied to Professional Social Networks Analysis.

DOI: 10.5220/0006333401230134

In Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS 2017) - Volume 1, pages 123-134

ISBN: 978-989-758-247-9

123

research problems related to social networks analysis

(SNA). Note that, there are other methods to retrieve

knowledge from social networks. They are usually

based on graph theory, clustering and frequent item-

sets, which provide an approach to represent a social

network through a formal way. Additionally, there are

several FCA to SNA applications, such as ontology-

based technique (Kontopoulos et al., 2013), social

communities (Ali et al., 2014), network representa-

tion through concept lattice (Cuvelier and Aufaure,

2011), contextual pre-ﬁltering process and identifying

user behavior through implication rules (Neto et al.,

2015a).

In this article, we proposed a FCA-based ap-

proach to identify professionalbehaviors through data

scraped from LinkedIn. First, we conceptually model

a LinkedIn user proﬁle according to the model of

competences proposed by Durand (1998), because

this model has more acceptance in industry and

academia (Brand˜ao and Guimar˜aes, 2001). Second,

as an input data set, our approach needs an incidence

table (formal context) and a subset of proper implica-

tions are expected as an output. Lastly, a proper im-

plication represent careers trajectories, by minimum

professional’s skills (conditions) for achieve a posi-

tion (goal). These implications was extracted using

our proposed algorithm named PropIm. The PropIm

algorithm was proposed to extract proper implications

with support greater than zero, using a scalable ap-

proach.

The main contributions of this paper are: a profes-

sionals data set scraped from LinkedIn, a FCA-based

approach, and experiments set for apply FCA to pro-

fessional career analysis.

The structure of the article follows as: In Section

2, we present the preliminary deﬁnitions related to

FCA. In Section 3, we report related work that applied

FCA to social networks analysis. In Section 4, we

present our FCA-based approach to SNA. In Section

5, we report experiments and results. Finally, in Sec-

tion 6, we present the conclusions and future work.

2 FORMAL CONCEPT ANALYSIS

In this section, we introduce concepts related of

formal concept analysis (FCA) reported on litera-

ture (Ganter and Wille, 2012).

2.1 Formal Context

Formally, a formal context is a triple (G,M,I), where

G is a set of objects (rows), M is a set of attributes

(columns) and I are incidences. It is deﬁned as

I ⊆ G × M. If an object g ∈ G and an attribute m ∈ M

have a relationship I, their representation is (g,m) ∈ I

or gIm, which could be read as “object g has attribute

m”. When an object has an attribute, the incidence is

identiﬁed and represented by “x”. In the formal con-

text shown in Table 1, rows are objects representing

users, and the columns are professional skills and po-

sitions.

Table 1: Example context of an user’s LinkedIn skills. The

attributes are: a: networks, b: mobile application, c: soft-

ware engineering, d: data bases, e: graphic processing, f:

computer architecture, g: operational systems.

a b c d e f g

17 x x x x x

18 x x

19 x x x

20 x x x x

21 x x x

22 x x

23 x x x x x

24 x x

Given a subset of objects A ⊆ G of formal context

(G,M,I), there is an attribute subset of M common

to all objects of A, even if empty. Likewise, given

a set B ⊆ M, there is an object subset that shares the

attributes of B, even if empty. These relationships are

deﬁned by derivation operations:

′

:= {m ∈ M|gIm∀g ∈ A} (1)

′

:= {g ∈ G|gIm∀m ∈ B} (2)

A formal context (G, M, I) is a clariﬁed context,

when ∀ g, h ∈ G, from g

′

= h

′

it always follows that

and, correspondingly m

′

= n

′

implies m = n ∀ m,n

∈ M. The clariﬁcation process consists in maintain-

ing one element (objects and attributes) from a set of

equal elements eliminating the others. In this process,

the number of objects and attributes can be reduced

while retaining lattice form (Ganter et al., 2005).

2.2 Formal Concept

From formal contexts we can obtain formal concepts,

deﬁned as pairs (A,B), where A ⊆ G is called exten-

sion and B ⊆ M and is called intention, and they must

follow conditions A = B

′

and B = A

′

(equations 1 and

2)(Ganter et al., 2005).

Based on the formal context from Table 1, we can

obtain the formal concept ({18, 21, 24}, {software

engineering, data bases}), where elements of subset

B are {software engineering, data bases}, that, by

derivation (Equation 2), yield subset A = {18, 21,

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

124

24}, representing the subset of users who have skills

in software engineering (c) and data bases (d).

It is important to note that a formal concept cor-

responds to any aspect of the problem domain, repre-

sented by objects and attributes, in which exists some

kind of comprehension and understanding.

2.3 Concept Lattice

With all formal concepts sorted hierarchically by or-

der of inclusion ⊆, we can build the concept lattice.

Sorting must be done, so that, the concept (A

) is

considered less than or equal to (A

) if and only

if, A

⊆ A

(equivalent to B

⊆ B

). In this case, the

concept (A

) is called sub-concept and the con-

cept (A

) super-concept. In Figure 1 is shown an

example of a concept lattice from the formal context

in Table 1.

Figure 1: Example of concept lattice.

2.4 Implication Rules

Given a formal context (G,M,I) or a concept lattice

B(G,M, I), these can be extracted exact rules or ap-

proximate rules (rules with statistical values, for ex-

ample, support and conﬁdence) that express in a alter-

native way the underlying knowledge. The exact rules

can be classiﬁed in implication rules and functional

dependencies, while the approximation rules are di-

vided in classiﬁcation rules and association rules. It

is particularly important in this work, to get the social

networks users’ behavior, consider exact rules. From

now these rules are going to called only implications.

Follows the deﬁnition of an implication (Ganter and

Wille, 2012):

Deﬁnition 2.1. Being a formal context whose at-

tributes set is M. An implication is an expression

P → Q, which P,Q ⊆ M.

An implication P → Q, extracted from a formal

context, or respective concept lattice, have to be such

that P

′

⊆ Q

′

. In other words: every object wich has

the attributes of P, it also have the attributes of Q.

Note that, if X is a set of attributes, then X respects

an implication P → Q iff P 6⊆ X or Q ⊆ X. An impli-

cation P → Q holds in a set {X

,... ,X

} ⊆ M iff each

respects P → Q; and P → Q is an implication of the

context (G,M,I) iff it holds in its set of object intents

(an object intent is the set of its attributes). An impli-

cation P → Q follows from a set of implications I , iff

for every set of attributes X if X respects I , then it

respects P → Q. A set of implications I is said to be

complete in (G, M, I) iff every implication of (G,M, I)

follows from I . A set of implications I is said to be

redundant iff it contains an implication P → Q that

follows from I \{P → Q}. Finally, an implication

P → Q is considered superﬂuous iff P∩ Q 6=

In social networks will be convenientthat each im-

plication represent a minimum behavior. For this, we

will require that the complete set of implications I

of a formal context (G, M,I) have the following char-

acteristics, to be used as representative of the process:

• the right hand side of each implication is unitary:

if P → m ∈ I , then m ∈ M;

• superﬂuous implications are not allowed: if P →

m ∈ I , then m /∈ P;

• specializations are not allowed, i.e. left hand sides

are minimal: if P → m ∈ I , then there is not any

Q → m ∈ I such that Q ⊂ P.

A complete set of implications in (G, M,I) with such

properties is the so called set of proper implications

(Taouil and Bastide, 2001) or unary implication sys-

tem (UIS) (Bertet and Monjardet, 2010).

Deﬁnition 2.2. Let J be the complete closed set of

implications of a formal context (G,M,I). Then the

set of proper implications I for (G,M,I) is deﬁned

as: {P → m ∈ J |P ⊆ M and m ∈ M \ P and ∀Z ⊂

P: Z → m /∈ J }.

Table 2: Proper implications extracted from formal context

in Table 1.

P → m

→ a

b,d

→ b

d,g → c

→ dc

→ e

b,d

The Table 2 shows the set of proper implications,

from the formal context example (Table 1). For ex-

ample, in implication b,d → a, the set P is composed

by the set of attributes {b, d}, m = {a} and → sym-

bol represents the incidence. P and m are called as

Formal Concept Analysis Applied to Professional Social Networks Analysis

125

premise and conclusion. So the implication b,d → a,

can be read as the premise b,d implies in conclusion

a. It is important to note that, the conclusion a can

has more than one minimal premises. According to

Deﬁnition 2.2, the premises {e} and {b, d} are mini-

mal, and they can not be a subset of another premises

that imply in a. For example, the implication e → a

is a proper implication, but e, g → a is not a proper

implication.

3 RELATED WORK

Recently, several authors have applied FCA for social

networks analysis (Rome and Haralick, 2005; Snasel

et al., 2009; Stattner and Collard, 2012; Aufaure and

Le Grand, 2013; Banerjee et al., 2014; Krajˇci, 2014;

Atzmueller, 2015; Neznanov and Parinov, 2015; Sol-

dano et al., 2015). These works have been motivated

by the interest in understand and interpret social net-

works through mathematical formulation. The main

subjects are: social network representation as a con-

cept lattice, community detection, concepts mining,

ontology analysis and rule mining through implica-

tions.

In (Cordero et al., 2015), the authors proposed

knowledge-based model of inﬂuence applying FCA to

compute minimal generators and closed sets directly

from an implicational system, for obtain a structure

of user’s inﬂuence. The data was extracted from Twit-

ter social network, it was transformed into a formal

context and it was generated the Duquenne-Guigues

basis. In (Neto et al., 2015b), the author shows an ap-

proach to analyze a data base composed by internet’s

access logs. The authors apply the minimal set of im-

plications and complex networks theories to identify

substructures, that are not easily visualized with two-

mode networks. In (Jota Resende et al., 2015), the au-

thors propose an FCA-based approach to build canon-

ical models, which represents Orkut access’ patterns.

These papers resemble this work, because they also

talks about how to map social networks in terms of

objects and attributes, and extract knowledge through

implications rules set.

As this work, in Barysheva et al. (2015) the

authors apply FCA to identify interaction patterns

through data scraped from LinkedIn. They did not

work with professional competencies and proper im-

plications, like us, because their goal was classify

users’ behavior though their network interactions, and

the knowledge was extracted from conceptual lattice.

The papers of Li et al. (2016); Xu et al. (2014a);

Lorenzo et al. (2016) are the main works about ca-

reer trajectory analysis using LinkedIn data. Their

goal was model professional proﬁles and identify ca-

reer trajectories through temporal analysis. Even not

applying FCA, these authors apply data mining tech-

niques to identify professional career trajectories.

In general, the FCA has been applied to min-

ing social media, because the FCA theory presents

a formalism for the representation of network struc-

ture, behavior identiﬁcation and knowledge extrac-

tion, through formal representation of problem do-

main from objects, attributes and their respective in-

cidences.

Our proposed approach combines techniques con-

ducted on formal concept analysis, patter mining and

model of professional competence.

4 FCA-BASED APPROACH

In this article, the problem of analysis and representa-

tion of professional proﬁles in social networks, can be

grounded by building a conceptual model, that merge

the social network with professional skills theory, and

the transformation of this model to a formal context.

After scraping and pre-process the data to a formal

context, we can extract the set of implications to be

analyzed. The Figure 2 shows the methodology steps

proposed to analyze LinkedIn social network through

FCA.

Figure 2: Methodology based in FCA to SNA.

4.1 Problem Domain

According to Figure 2, the ﬁrst step (1) was to con-

struct the conceptual model according to the problem

to be treated. In this case, the problem involves the

characterization of a person as a professional. We

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

126

Figure 3: Professional identiﬁcation through model of competence.

adopted the competence model proposed by Durand

(1998), because this model has greater acceptance

in both academic and business, since it seeks to in-

tegrate aspects related to this work, such as tech-

nical issues, cognition and attitudes (Brand˜ao and

Guimar˜aes, 2001). For Durand (1998), a professional

is characterized by his competence in accomplishing

a certain purpose. Competence is composed by three

dimensions: knowledge, attitudes and know how. The

dimensions are interdependent, because the behavior

of a professional is determined not only by his knowl-

edge but also by the attitudes and know how acquired

over time. The knowledge dimension is linked to

academic training and complementary courses. The

know how dimension is related to a person’s profes-

sional experiences. And ﬁnally, the attitude dimen-

sion is related to the way of people interact in their

professional environment.

The conceptual model was created from the clas-

siﬁcation of informational categories, based on the

model of competence. Figure 3 shows the domain

model for professional identiﬁcation through model

of competence. The ﬁrst level is related to the Com-

petence concept, in the second level is 3 dimensions,

in the third level is 14 aspects and in the fourth level

is 51 variables.

4.2 Scrapping

FCA techniques to social network analysis can yield

insights into user behaviors, detecting popular topics,

and discovering groups and communities with similar

characteristics. So, a task of gathering the data on a

speciﬁc subject is needed. In this case, the second step

(2) of the methodology represents the Scraping com-

ponent that is responsible for collecting the LinkedIn

user data.

The collection process was divided into two

phases. The ﬁrst one selects the initial seeds, ran-

domly two user proﬁles were selected. This amount

of initial seed was satisfactory for the data collection

process, due to the total of proﬁles obtained being suf-

ﬁcient for the study. It was deﬁned that, as case study,

the data would be collected from people of Belo Hor-

izonte, Minas Gerais, Brazil and they must have at

least graduate courses in the information technology

(IT) area.

Figure 4: Scraping LinkedIn proﬁle’s process.

The second phase goal is collect the public pro-

ﬁles. As the LinkedIn does not provide an API (Appli-

cation Programming Interface) to extract data directly

Formal Concept Analysis Applied to Professional Social Networks Analysis

127

from the server, an approach, known as open collec-

tion, has been adopted to extract data from users’ pub-

lic pages.

The Figure 4 exempliﬁes the ﬂow of data collec-

tion process. The process starts by accessing a seed.

In the public proﬁle there is a section denoted as “Peo-

ple also saw” - a list with the 10 most similar proﬁles

related to the visited proﬁle (Xu et al., 2014b). The

collector looks up these addresses and veriﬁes which

proﬁles meet the scope. Valid proﬁles are stored and

each one becomes a new seed to extract new links,

restarting the collection process until it reaches the

stop criterion. The stopping criterion is based on the

percentage of new proﬁles. Each iteration checks if

65% of the proﬁles were already in the database.

4.3 Preprocessing

The Preprocessing (3) component is responsible for

pre-processing the data extracted in the previous step.

In this work only the variables skills and experience

were considered. However, in future works, the other

dimensions will be included.

For the construction of the formal context, we con-

sidered, as attribute, each value of the competence

and experience variables. Each user (professional of

LinkedIn) is considered as an object. In the ﬁrst ver-

sion of the formal context, 4000 attributes and 1280

objects were detected. As such values are texts and

LinkedIn allows users to ﬁll the corresponding ﬁelds

with a free text, some problems have been detected.

In this case, it was necessary to create an ETL pro-

cess (Extract Transform Load) to clean and transform

the data, aiming to reduce the amount of attributes.

The ETL process consists of two stages. In

the ﬁrst step, we applied basic procedures for string

cleaning, like: UTF-8 encoding correction, accent re-

moval,and standardization of all terms for the English

language through Google Translate API

. In the sec-

ond stage, we apply techniques to attribute reductions.

Such reductions were based on terms with semantic

relevance, in which the attributes skill and experience

could be reduced. For example, attributes as jpa, jsf

were renamed to java frameworks; attributes as de-

veloper, programmer, software developer, program

developer were renamed to software developer. The

vague nouns or trademark terms, as bachelor, engi-

neer, accessibility, microsoft, were removed, because

they are not relevant to our study.

At the end of the preprocessing step, a formal con-

text was created with 366 attributes and 970 objects,

which 61 attributes are related to experience and 305

to skills.

Translate API: https://cloud.google.com/translate/

4.4 Knowledge Extractor

The Impec (Taouil and Bastide, 2001) is the best-

known algorithm for extracting proper implications

from the formal context. This is due to the strategy

that the algorithm adopts at the moment in which it

ﬁnds the premises and their respective conclusions.

On the other hand, the strategy used, by the algo-

rithm, is computationally inefﬁcient and it can not be

optimized or distributed. In some cases, only a few

context attributes are interesting to be in the conclu-

sion. The traditional algorithms only allow to gener-

ate the complete set, being necessary a step of ﬁltering

the rules to select to those whose attributes of interest

appear as conclusion of the implications, occurring an

unnecessary computational effort.

Based on the problems described above, a new al-

gorithm was proposed to generate proper implications

from the formal context. The algorithm adopts an

easily scalable strategy and allows its proper impli-

cations to be generated from the attributes of interest

as a conclusion of implications. In general, the al-

gorithm receives a formal context as input, ﬁnds the

minimum premisses for each conclusion by combin-

ing attributes, applies a pruning heuristic, and uses the

derivation operators to validate the implications.

Input : Formal context (G,M,I)

Output: Set of proper implications I with

support greater than 1

1 I =

2 foreach m ∈ M do

3 P = m

′′

4 size = 1

5 Pa =

6 while size < |P| do

7 C =



size



8 P

= getCandidate(C,Pa)

9 foreach P1 ⊂ P

10 if P1

′

0 and P1

′

⊂ m

′

then

11 Pa = Pa∪ {P1}

12 I = I ∪ {P1 → m}

13 end

14 end

15 size+ +

16 end

17 end

18 return I

Algorithm 1: Extract proper implications with support

greater than 1.

The pseudo-code of PropIm is given in Algo-

rithm 1. The main objective is to ﬁnd implications

whose left side is minimal and the right side has only

one attribute. The algorithm needs a formal context

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

128

Table 3: Example of PropIm algorithm execution.

m P size C Pc Pa I

a {b, d, e, g} 1 {{b}, {d}, {e}, {g}} {{b},{d},

{e}, {g}}

a {b,d,e,g} 2 {{bd}, {be}, {bg}, {de}, {dg},

{eg}}

{{bd},{bg},

{dg}}

{{e}, {bd}} {{e-a}, {bd-a}}

a {b,d,e,g} 3 {{bde}, {bdg}, {beg}, {deg}}

{{e},{bd}} {{e-a},{bd-a}}

a {b,d,e,g} 4 {{b,d,e,g}}

{{e}, {bd}} {{e-a},{bd-a}}

b {a,d,e,f,g} 1 {{a} , {d}, {e}, {f}, {g}} {{a},{d},

{e},{f},{g}}

{{a}, {e}, {f}} {{e-a}, {bd-a},

{a-b}, {e-b},

{f-b}}

b {a,d,e,f,g} 2 {{ad}, {ae}, {af}, {ag}, {de}, {df},

{dg}, {ef}, {eg}, {fg}}

{{dg}} {{a}, {e}, {f}} {{e-a}, {bd-a},

{a-b}, {e-b},

{f-b}}

b {a,d,e,f,g} 3 {{ade}, {adf}, {adg}, {aef},

{aeg}, {afg}, {def}, {deg}, {dfg},

{efg}}

{{a}, {e}, {f}} {{e-a}, {bd-a},

{a-b}, {e-b},

{f-b}}

c {d,g} 1 {{d}, {g}} {{d},{g}}

{{e-a},

{bd-a},{a-b},

{e-b}, {f-b}}

c {d,g} 2 {{dg}} {{dg}}

{{e-a}, {bd-a},

{a-b}, {e-b},

{f-b}}

(G,M,I) as input, and its output is a set of proper im-

plications. Line 1 initializes the set I with empty

set. The following loop (Lines 2-17) looks at each at-

tribute in the set M. We initially suppose that each

attribute m can be a conclusion for a set of premises.

For each m, we compute a left-hand side P1.

To reduce the searching space, the algorithm ﬁnds

the right side P for a left side m from a set of attributes

common to m objects. After, it ﬁnds sets of possible

premises for m based on P. The size counter deter-

mines the size of each premise, as the smallest possi-

ble size is 1 (an implication of type {b} → { a}), it is

initialized with 1 (Line 4).

A set of auxiliary premises Pa is used, where

all valid premises found leading to m conclusion are

stored (at Line 5 Pa is initialized as empty). In the

loop, from Lines 6-16, the set of minimum premises

is found and is bounded by |P|. In Line 7, the C set

gets all combinations of size size from elements in

P. In Line 8, the set of candidate premises is formed

through the function getCandidate which will be de-

scribed next.

Each candidate premise P1 ⊂ P

is checked to

ensure if the premise P1 and the conclusion m re-

sults in a valid proper implication. Case P1 6=

0 and

P ⊂ P1

′

, the premise p1 is added to the set of auxiliary

premises Pa and I = I ∪ {P1 → M} .

A neighborhood search heuristic was imple-

mented by the getCandidate (Pseudo-code in Algo-

rithm 2) function. The objective is to ﬁnd, in the set of

C combinations, all subsets B that do not contain some

attribute that already belongs to some valid premise of

Pa. It receives, as parameter, the sets C and Pa, and

returns a set D of proper premises.

1 Function getCandidate (C,Pa)

2 D =

3 foreach a ∈ A|A ⊂ Pa do

4 foreach B ⊂ C do

5 if a /∈ B then

6 D = P

\ B

7 end

8 end

9 end

10 return D

Algorithm 2: Function getCandidate.

Table 3 shows the steps of PropIm algorithm,

on the example from Table 1. I contains ini-

tially

0. The ﬁrst value to m is a (ﬁrst attribute

from formal context) and m

′′

is the set of attributes

{b,d,e,g}. The size of combination sets is 1, so C =

{{b}, {d},{e},{g}}. Pa contains initially

0, C

0, so

the set of attributes returned by the function getCan-

didate is Pc = {{b},{d},{e},{g}}. For each subset

of Pc, only the element {e} attends the condition in

Line 10 (Algorithm 1), because {e}

′

= {17,20,23}

⊂ m

′

. The set {e} is added to Pa and the pair {e → a}

is added to I . When Pc is

0 and size is |P| the loop

to m = a is closed. So, the same steps happens for all

attributes, from formal context, imputed to m.

5 EXPERIMENTS

This section shows the procedures, adopted for run-

ning the experiments, and the analyses of results ob-

tained based in proposed FCA-based approach. The

Formal Concept Analysis Applied to Professional Social Networks Analysis

129

goal, of experiments, was to answer the following

questions:

• How do proper implications identify relations be-

tween skills and positions?

• Could we ﬁnd intersections among sets of skills,

and what do these intersections represent?

5.1 Proper Implications to Competence

Identiﬁcation

Among the 61 positions identiﬁed in Section 4.3, we

selected 20 positions and their 180 skills for analyze

the proper implications. In this case, the PropIm al-

gorithm extracted 895 proper implications related to

this reduced formal context. Figure 5, shows these

proper implications as a graph representation (proper

implications network). The central nodes are the po-

sitions and the edges represents the implications be-

tween premises (set of skills) and their conclusion

(position). In this study case, the graph representa-

tion helps us to analyze the distribution among sets of

skills and their respective positions.

In Figure 5 was highlighted some items related to

graph analysis:

• Central nodes density;

• Intersections among premises;

• Edges weight.

Figure 5: Proper implications network.

The central nodes density represents the diver-

siﬁcation of minimum sets of skills. The denser

nodes represents positions which have more diversi-

ﬁcation of minimum sets of skills. For example, the

highlighted node AD related to administrative direc-

tor position have 163 minimal sets of skills. Gen-

erally, the administrative director function is man-

age the company resources, like human, technolo-

gies and ﬁnancial resources. The speciﬁc skills of

this professional can be different according to the

company industry, because he have to develop busi-

ness skills and know how about the company re-

sources. It is expected that administrative director de-

velop skills related to leadership, management, tech-

nology and communication. One of the proper im-

plications which represents this professional proﬁle

is {entrepreneurship, human resources, information

management} → {administrative director}. Another

implication as {assets management, it governance,

leadership development, software development} →

{administrative director}, can represent a speciﬁc ad-

ministrative director from companies focused on soft-

ware development.

The less dense nodes represent jobs positions that

demand more speciﬁc sets of skills. For example the

ITC (IT consultant) node have only 3 sets of skills re-

lated to it. An IT consultant duties can vary depend-

ing on the nature of company’s project and client de-

sires. However, in general, this professional has skills

which combine IT and business knowledge. So, the

proper implication {ABAP

, agile methodology, BI

}

→ {it consultant} shows the common set of skills that

the IT consultants have.

The analysis related to intersection between

premises and edges weight will be discussed in the

next section (5.2).

5.2 Intersection between Skills and Job

Positions

According to Career Cast research (Cast, 2016), the

top 3 best jobs in Information Technology area is: data

scientist, information security analyst and software

engineer. From the set of proper implications, gen-

erated by PropIm algorithm, we ﬁltered the top 3 jobs

positions, for analyze these jobs and identify the in-

tersection among their skills.

Figure 6 shows the top 3 job positions and the in-

tersections among their skills. The central nodes are

the top 3 positions, according to Career Cast ranking

(Cast, 2016): P

(data scientist), P

(information se-

curity analyst) and P

(software engineer). The edges

weight are the implication relative frequency. The rel-

ative frequency was calculated according to equation:

F =

, where F is the relative frequency, F

is the

ABAP: Advanced Business Application Programming

BI: Business Intelligence

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

130

implication absolute frequency and F

is |m

′

|. For ex-

ample: the implication {C language} → {software

engineer} represents 31 objects (Fi) among 59 ob-

jects that have software engineer as incidence (Fp).

So, this implication relative frequency is 0.52. For

obtain the result as percentage is only multiply by

100, generating the relative frequency percentage of

52% to this proper implication in software engineer

set of implications. Therefore, the thicker edges rep-

resents implications with greater relative frequency. It

is important to note that, we applied the relative fre-

quency measure, because in this case, the frequency

represents the signiﬁcance of an implication inside its

class. And the local signiﬁcance is more relevant than

the implication support in the complete proper impli-

cations set. For example, the implication { java frame-

works} → {software engineer} has relative frequency

of 75.56%, and its support is 2.47%. In this example,

to analyze the relative frequency is more important

than the implication support, because the speciﬁc ob-

jective is identify the conditions to reach a software

engineer job. Is important to note that, in both cases

the conﬁdence is 100%.

Figure 6: Top 3 jobs and their skills, where P

is data sci-

entist, P

is information security analyst and P

is software

engineer job position.

Figure 6 shows the intersections between the min-

imal sets of skills, considering the top 3 jobs positions

described above. In this case, the nodes P

and P

share four set of skills like {BI} → {security analyst}

and {BI} → {software engineer}. The nodes P

and

share two set of skills, like the proper implications

show: {agile methodology} → {data scientist} and

{agile methodology} → {information security ana-

lyst}. And, the nodes P

and P

share only one set of

skills, on {active directory

} → {data scientist} and

{active directory} → {software engineer}.

From these intersections, we observed that the

greater the intersections amount between skills sets,

more similar are the requirements to achieve a po-

sition. It would indicated possibilities to profes-

sional mobility among positions, when the set of skills

(premises) implies in several different positions (con-

clusions). So a professional could havecompetence to

assume different positions, because his skills could be

applied to different jobs. For example, in recruitment

and selection hiring process, this professional could

be compatible with several job vacancy, therefore he

could be more jobs opportunities. Another example

is the case when a professional needs change jobs, his

skills allow greater career mobility.

Figure 7: IT career hierarchical levels, where P

to P

rep-

resents the following job positions: P

is IT analyst, P

is IT

manager, P

is IT coordinator and P

is IT director.

Figure 7 shows 4 positions that represents differ-

ent hierarchical levels of IT career. The central nodes

represent these 4 positions: P

(IT analyst), P

(IT

coordinator), P

(IT manager) and P

(IT director).

The other nodes represent minimal sets of skills, and

edges represent implications. It is important to note

that, edges weight was calculated using the relative

frequency, previously described. From the ﬁgure we

could observe that there are disjoint sets and there

are not any intersections among positions. Accord-

ing to this hierarchy, P

and P

are positions related

to the early career, while P

and P

are positions hier-

archically superior. So, for P

was expected technical

skills like in the proper implication {.NET, automa-

Active directory: Microsoft tool kit for store and con-

trol information about network conﬁgurations.

Formal Concept Analysis Applied to Professional Social Networks Analysis

131

tion systems} → {IT analyst}. P

involves skills that

represent the transition between technical and man-

agerial level, like in the proper implication {.NET,

data base, ERP, it governance} → IT coordinator.

also involves skills related to hierarchical transi-

tion, but it was expected more managerial than tech-

nical skills, it could be expressed by the implication

{BPM

, cloud computing, CRM

} → IT manager. Fi-

nally, an IT director (P

) have to develop managerial

skills like was identiﬁed in implication {assets man-

agement, BI, business management, consulting} → IT

director. Then, for the professional get a career ad-

vancement, he have to develop skills of different na-

tures.

6 CONCLUSION AND FUTURE

WORKS

In this paper, we presented an FCA-based approach

to identify professional behavior through data scraped

from LinkedIn. Speciﬁcally, we apply proper impli-

cations to identify the minimum sets of skills that is

necessary for achieve a job position. In this case,

ﬁrstly, we model the problem domain to understand

the features which characterize a person as a pro-

fessional, according to model of competence. After,

we scraped data from LinkedIn, we applied prepro-

cessing techniques and transform this data into a for-

mal context. Finally, we extracted proper implica-

tions through PropIm algorithm and analyze the re-

sults through graph representations.

The main contributions of this paper were: a pro-

fessionals data set scraped from LinkedIn, a FCA-

based approach, and experiments set for apply FCA

to professional career analysis.

As part of FCA-based approach, we propose the

PropIm algorithm. The goal of PropIm algorithm is

extract proper implications with support greater than

zero. It was implemented applying pruning heuristics

and scalable strategy. In future works the algorithm

will be modiﬁed to run as a distributed application.

The problem’s order complexity to extract proper im-

plications from formal context is O(|M||I |(|G||M| +

|I ||M|)). The proposed algorithm also has exponen-

tial complexity, but the implemented strategy reduce

the computational effort computing only implications

with support greater than zero, and the pruning heuris-

tic reduce the possibilities of attributes combinations

into premises. Moreover, the initial loop allows pro-

cess the context in distributed way, because each con-

BPM: Business process management.

CRM: Customer relationship management

clusion from formal context can be processed sepa-

rately without causing loss of information in the ﬁnal

set of proper implications.

The experiments answer the two questions:

• How do proper implications identify relations be-

tween skills and positions?

• Could we ﬁnd intersections among sets of skills,

and what do these intersections represent?

In both questions, the proper implicationswas showed

as graph representation and the analysis is based in

central nodes density, intersections among premises

and edges weight. For the ﬁrst question, experiments

show that denser nodes represent positions with more

diversiﬁcation of minimum sets of skills, while less

dense nodes represent jobs that require more spe-

ciﬁc sets of skills. For the second question, experi-

ments show that there are intersections between sets

of skills from different jobs positions. These inter-

sections means that same set of skills is required for

different positions, which allows more professional

opportunities in different industries and more profes-

sional mobility. We also analyzed a set of positions

that compose a hierarchy of jobs in IT area. With this,

it was observed that disjointed sets were formed with-

out intersections between the skills, which shows that

for a professional to progress of career, it is necessary

to develop skills of different natures.

As future work, we intend to exploit other algo-

rithms, particularly those capable of obtaining the set

of implications from concept or the subset of formal

concepts as proposed by Dias (2016). Moreover, we

intend to implement the PropIm algorithm as a dis-

tributed application and compare several algorithms

to extract proper implications from formal context.

We also intend expand the experiments for all dimen-

sions from the professional model of competence.

ACKNOWLEDGEMENTS

The authors acknowledge the ﬁnancial support re-

ceived from the Foundation for Research Support of

Minas Gerais state, FAPEMIG; the National Council

for Scientiﬁc and Technological Development, CNPq;

Coordination for the Improvement of Higher Educa-

tion Personnel, CAPES. We would also express grati-

tude to the Federal Service of Data Processing, SER-

PRO.

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

132

REFERENCES

Ali, S. S., Bentayeb, F., Missaoui, R., and Boussaid, O.

(2014). An Efﬁcient Method for Community Detec-

tion Based on Formal Concept Analysis, pages 61–72.

Springer International Publishing, Cham.

Atzmueller, M. (2015). Subgroup and community analyt-

ics on attributed graphs. In Proceedings of the Work-

shop on Social Network Analysis using Formal Con-

cept Analysis.

Aufaure, M.-A. and Le Grand, B. (2013). Advances in fca-

based applications for social networks analysis. Int. J.

Concept. Struct. Smart Appl., 1(1):73–89.

Banerjee, S., Badr, Y., and Al-shammari, E. T. (2014). So-

cial Networks: A Framework of Computational Intel-

ligence, volume 526. Springer Berlin Heidelberg.

Barysheva, A., Golubtsova, A., and Yavorskiy, R. (2015).

Proﬁling less active users in online communities. In

Proceedings of the Workshop on Social Network Anal-

ysis using Formal Concept Analysis.

Bertet, K. and Monjardet, B. (2010). The multiple facets of

the canonical direct unit implicational basis. Theoret-

ical Computer Science, 411(22–24):2155 – 2166.

Brand˜ao, H. P. and Guimar˜aes, T. d. A. (2001). Gest˜ao de

competˆencias e gest˜ao de desempenho: tecnologias

distintas ou instrumentos de um mesmo construto?

Revista de Administrac¸˜ao de empresas, 41(1):8–15.

Cast, C. (2016). Jobs rated report 2016: Ranking 200 jobs.

Accessed in 2016-12-12.

Codocedo, V., Baixeries, J., Kaytoue, M., and Napoli, A.

(2016). Contributions to the formalization of order-

like dependencies using fca. In Proceedings of the 5th

International Workshop What can FCA do for Artiﬁ-

cial Intelligence. CEUR-WS.

Cordero, P., Enciso, M., Mora, A., Ojeda-Aciego, M., and

Rossi, C. (2015). Knowledge discovery in social net-

works by using a logic-based treatment of implica-

tions. Know.-Based Syst., 87(C):16–25.

Cuvelier, E. and Aufaure, M.-A. (2011). A buzz and e-

reputation monitoring tool for twitter based on ga-

lois lattices. In Conceptual Structures for Discovering

Knowledge, pages 91–103. Springer, Berlin Heidel-

berg.

Dias, S. M. (2016). Reduc¸ ˜ao de Reticulados Conceituais

(Concept Lattice Reduction). PhD thesis, Department

of Computer Science of Federal University of Mi-

nas Gerais (UFMG), Belo Horizonte, Minas Gerais,

Brazil. In Portuguese.

Durand, T. (1998). Forms of incompetence. In Proceed-

ings Fourth International Conference on Competence-

Based Management. Oslo: Norwegian School of Man-

agement.

Ganter, B., Stumme, G., and Wille, R. (2005). Formal con-

cept analysis: foundations and applications, volume

3626. Springer Science & Business Media.

Ganter, B. and Wille, R. (2012). Formal concept analysis:

mathematical foundations. Springer Science & Busi-

ness Media.

Jota Resende, G., De Moraes, N. R., Dias, S. M., Mar-

ques Neto, H. T., and Zarate, L. E. (2015). Canonical

computational models based on formal concept anal-

ysis for social network analysis and representation. In

Web Services (ICWS), 2015 IEEE International Con-

ference on, pages 717–720. IEEE.

Kontopoulos, E., Berberidis, C., Dergiades, T., and Bassili-

ades, N. (2013). Ontology-based sentiment analysis

of twitter posts. Expert Systems with Applications,

40(10):4065 – 4074.

Krajˇci, S. (2014). Social Network and Formal Concept

Analysis, pages 41–61. Springer International Pub-

lishing, Cham.

Li, L., Zheng, G., Peltsverger, S., and Zhang, C. (2016).

Career trajectory analysis of information technology

alumni: A linkedin perspective. In Proceedings of the

17th Annual Conference on Information Technology

Education, SIGITE ’16, pages 2–6, New York, NY,

USA. ACM.

LinkedIn (2016). About linkedin. Accessed in 2016-12-02.

Lorenzo, E. R., Cordero, P., Enciso, M., Missaoui, R., and

Mora, A. (2016). Caisl: Simpliﬁcation logic for con-

ditional attribute implications. In CLA.

Neto, S. M., Song, M., Dias, S., et al. (2015a). Min-

imal cover of implication rules to represent two

mode networks. In 2015 IEEE/WIC/ACM Interna-

tional Conference on Web Intelligence and Intelligent

Agent Technology (WI-IAT), volume 1, pages 211–

218. IEEE.

Neto, S. M., Song, M. A., Dias, S. M., and Z´arate, L. E.

(2015b). Using implications from fca to represent

a two mode network data. nternational Journal of

Software Engineering and Knowledge Engineering

(IJSEKE).

Neznanov, A. and Parinov, A. (2015). Analyzing social net-

works services using formal concept analysis research

toolbox. In Proceedings of the Workshop on Social

Network Analysis using Formal Concept Analysis.

Rome, J. E. and Haralick, R. M. (2005). Towards a Formal

Concept Analysis Approach to Exploring Communi-

ties on the World Wide Web, pages 33–48. Springer

Berlin Heidelberg, Berlin, Heidelberg.

Russell, M. A. (2013). Mining the Social Web: Data Mining

Facebook, Twitter, LinkedIn, Google+, GitHub, and

More. ” O’Reilly Media, Inc.”.

Snasel, V., Horak, Z., Kocibova, J., and Abraham,

A. (2009). Analyzing social networks using fca:

Complexity aspects. In Web Intelligence and In-

telligent Agent Technologies, 2009. WI-IAT ’09.

IEEE/WIC/ACM International Joint Conferences on,

volume 3, pages 38–41.

Soldano, H., Santini, G., and Bouthinon, D. (2015). Ab-

stract and local concepts in attributed networks. In

Proceedings of the Workshop on Social Network Anal-

ysis using Formal Concept Analysis.

Stattner, E. and Collard, M. (2012). Social-based con-

ceptual links: Conceptual analysis applied to social

networks. In Advances in Social Networks Analy-

sis and Mining (ASONAM), 2012 IEEE/ACM Interna-

tional Conference on, pages 25–29.

Taouil, R. and Bastide, Y. (2001). Computing Proper Impli-

cations. In Proceedings of the International Confer-

Formal Concept Analysis Applied to Professional Social Networks Analysis

133

ence on Conceptual Structures - ICCS, pages 46–61,

Stanford, CA US.

Xu, Y., Li, Z., Gupta, A., Bugdayci, A., and Bhasin, A.

(2014a). Modeling professional similarity by min-

ing professional career trajectories. In Proceedings of

the 20th ACM SIGKDD international conference on

Knowledge discovery and data mining, pages 1945–

1954. ACM.

Xu, Y., Li, Z., Gupta, A., Bugdayci, A., and Bhasin, A.

(2014b). Modeling professional similarity by min-

ing professional career trajectories. In Proceedings of

the 20th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, pages 1945–

1954. ACM.

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

134