Managing Literature Reviews Information through Visualization

Sandra Fabbri

, Elis Hernandes

, Andre Di Thommazo

1,2

, Anderson Belgamo

1,3

Augusto Zamboni

and Cleiton Silva

Software Engineering Research Lab, Universidade Federal de São Carlos, São Carlos, SP, Brazil

Instituto Federal de Educação, Ciência e Tecnologia de São Paulo, São Carlos, SP, Brazil

Instituto Federal de Educação, Ciência e Tecnologia de São Paulo, Piracicaba, SP, Brazil

Keywords: Systematic Literature Review, Systematic Mapping, Tool, Visualization.

Abstract: Systematic Literature Review (SR) and Systematic Mappings (SM) are scientific literature review

techniques that follow well-defined stages, according to a protocol previously elaborated. The goal is

helping in finding evidence about a particular research topic and mapping a research area, respectively.

Their steps are laborious and a computational support is essential to improve the quality of their conduction.

Aiming to offer computational support to these types of reviews, the StArt (State of the Art through

Systematic Review) tool was developed. Besides the expected functionalities, StArt generates studies score,

uses information visualization and text mining techniques to facilitate the research area mapping and to

identify the studies relevance. StArt has been developed through an incremental process by academics who

adopt SR and SM. As the expectation is to have a tool that really aids the conduction of these types of

reviews, new ideas are always investigated and make StArt different from other alternatives. Visualization

and text mining techniques seems to be a powerful resource for facilitating data abstraction in the context of

SRs and SMs, allowing the improvement of the review and the conclusions about it.

1 INTRODUCTION

The Systematic Literature Review process (SR or

SM) has its origins in the medical area and its

objective, according to Pai et al. (2004), is the

creation of a complete and impartial summary about

a given research topic following well defined and

known procedures. Recently, this process is being

adapted to the computer science area, particularly in

Software Engineering (Kitchenahm, 2004). Some

advantages of the SR usage are the coverage, the

replicability and the reliability of its process.

Besides systematizing the search for relevant

studies, the SR predicts the organization and the

analysis of the obtained results. However, the SR

process is more laborious than the research

conducted on an informal basis (Kitchenahm, 2004).

A previous activity to the SR should can be the

Systematic Mapping (SM) which objective,

according to Petersen et al. (2008), is to build a

classification scheme and to structure a software

engineering research area. Like a SR, SM is also a

laborious activity and its process is similar to the SR

process, with many repetitive steps. One of the main

differences between SR and SM is that the desired

results of SMs are mainly quantitative but not

qualitative and the studies should not be read in full.

Despite this fact, quantitative data can also aid the

summarization that should be provided by a SR.

Thus, considering that there are several steps to

be executed and several documents to be managed,

the computer support can aid the conformance to the

SR and SM processes, enabling higher quality in

their execution.

Since 2006 the Start tool (Montebelo et al, 2007)

has been developed. In 2008 it was completely

restructured and the new version was available

(Zamboni et al., 2010) (Hernandes et al., 2010). This

version gave full support to carry out SRs and

currently, visualization and text mining resources are

being added for easing data summarization since, in

general, there is a lot of data for transforming into

knowledge, which is a challenge. As mentioned by

Burley (2010), information visualization is a

valuable tool for knowledge integration activities

and, in StArt, such views allow the researcher to

find, in a simple way, information on the most

important events, the evolution of the research topic

by the academic community, and so on. This

Fabbri S., Hernandes E., Di Thommazo A., Belgamo A., Zamboni A. and Silva C..

Managing Literature Reviews Information through Visualization.

DOI: 10.5220/0004004000360045

In Proceedings of the 14th International Conference on Enterprise Information Systems (ICEIS-2012), pages 36-45

ISBN: 978-989-8565-11-2

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

information is very common in SM.

Another important contribution that has been

reached with information visualization in StArt is

the evaluation of the search strings quality. An

important point in this kind of literature reviews is to

find and ensure that the search strings bring all the

relevant studies on the research topic. The Start tool

provides a visualization of all the studies retrieved as

well as their references. Hence, it is possible

identifying for example, if a frequently cited

reference was or was not retrieved by the search

string.

Based on this context, the objective of this paper

is to explore the contributions of information

visualization for this kinds of literature reviews.

Section 2 presents an overview of StArt

functionalities and highlights some features that aid

the control of the processes related to these kinds of

literature reviews. Section 3 explains the

visualization support provided by StArt and how it

can be used to enhance the summarization of the

investigated topic. Section 4 presents the support of

text mining processing and Section 5 presents the

conclusions and future work.

2 AN OVERVIEW OF THE StArt

Before explaining how information visualization and

text mining processing help on identifying important

information for SM and SRs, an overview of the

main functionalities of StArt is presented below. As

mentioned before, the processes of SR and SM have

some repetitive steps and require discipline and

systematic practice from the researcher. The

information must be registered in an organized way,

such that the expected results are reached, the

process can be replicable and all the information can

be packed.

Thus, StArt has been developed for providing

automated support to as many steps as possible.

Functionalities to ease data summarization were also

implemented in the tool as the possibility to display

data through visualization and Excel formatted

reports, according to the researcher’s needs.

As the SM process is a subset of the SR process,

StArt was initially planned to support SRs and

currently it is being adapted to also support SMs.

Figure 1 illustrates the general process of SR,

highlighting what is done with (left side) and

without (right side) StArt support. As electronic

scientific databases do not allow automated search

of primary studies, steps 2, 3 and 4 must be executed

without the support of the tool. They are: the

adjustment of search strings in search engines,

which happens while the protocol is being defined

and reviewed; the execution of these search strings

after the protocol approval; and the exportation of

the search result in a BibTex file, respectively. The

step numbers used in this figure will be used in the

explanation of the StArt functionalities.

The main functionalities of StArt are presented

in the screen shot of Figure 2. At the left side there is

the hierarchical directory tree with the SR process

phases. At the right side, the information associated

to the functionality selected on the left side is

presented.

Shortly, the goals of the three phases are:

 Planning Phase, which consists of the protocol

filling (Step 1 of Figure 1);

 Execution Phase, which is composed of Studies

Identification (Steps 2, 3, 4, and 5 of Figure 1),

Selection (Steps 6, 7, and 8 of Figure 1) and

Extraction (Step 9 of Figure 1). In this phase

the researcher should identify the studies, select

them and extract the relevant information for

answering the research question.

 Summarization Phase (Steps 10 and 11 of

Figure 1), which corresponds to the analysis of

the data extracted from each accepted study and

the elaboration of a final report describing the

state of the art. For this phase, StArt provides

graphics, spreadsheets and data visualizations,

aiming to make the researcher’s tasks easier.

Such options will be detailed in Section 4.

Following, each phase is detailed, exemplifying

the support provided by the StArt tool.

2.1 Planning

In this phase StArt supports the SR Protocol

elaboration (Step 1 of Figure 1) according to the

attributes suggested by Kitchenham (2007). Some of

the attributes are: research question definition;

keywords that will be used for searching for studies;

search engines; criteria for acceptance or rejection of

studies; etc. There is a help message for each

protocol attribute aiming to guide its filling. The

protocol is stored in the tool and can be accessed and

modified if necessary. It is worth noting that, to

ensure the SR process conformance, the content of

the protocol fields are reflected in later steps of the

SR process. For example, when a search engine is

chosen during the protocol filling, it is added under

the Studies Identification of the Execution Phase, as

shown in Figure 3. Similarly, each attribute inserted

in the Information Extraction Form Attributes during

the protocol filling becomes a field that must be

ManagingLiteratureReviewsInformationthroughVisualization

Figure 1: SLR steps: Left side – actions supported and Right side – actions not supported by StArt.

Figure 2: Overview of the StArt tool.

filled in during the Extraction Step (Step 9 of Figure

1), as shown in Figure 4.

2.2 Execution

This phase of the SR has three steps according to the

guidelines proposed by Kitchenham (2004) (2007).

The first one is Studies Identification (Steps 2 to 5 of

Figure 1). In this step, the researcher should adjust the

search string using the keywords earlier defined in the

protocol. After this step, the strings should be applied

in each search engine, for example, IEEE, Scopus,

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

ACM, Springer and Web of Science. This action is

not supported by the tool and the search results must

be imported into StArt, As the studies are being

imported into the tool, it assigns a score for each

study according to the occurrences of the keywords

defined in the protocol, in the studies title, abstract

and keywords list. This score can be used, for

example, to establish an order of reading once studies

with higher scores should be more relevant to the SR.

Also, if the studies with higher scores are not relevant

to the research question, it is possible that the strings

should be revisited and improved. The string

definition is an important point to the success of SRs,

and its quality can be accessed through visualization

provide by StArt, which is explored and presented in

Section 4.

The second step is Studies Selection (Step 6 of

Figure 1). In this step, the researcher should use the

inclusion and exclusion criteria, defined in the

protocol, to classify the studies as accepted or

rejected. Duplicated studies are automatically

identified by the tool. When the study is accepted, the

researcher can attribute to it a relevance level (Very

High, High, Low or Very Low).

The third step is Extraction (Steps 7, 8 and 9 of

Figure 2). At this step, the researcher must read the

full version of each study “Accepted study”, elaborate

a summary and fill in the Information Extraction

Form of each study (Figure 4-B).

Aiming to facilitate this step, it is possible to link

the studies full text file (e.g. PDF files) with their

record in the tool.

2.3 Summarization

In this phase (Step 10 of Figure 2), StArt provides

the following facilities:

 Easy access to the information of all studies

accepted in Extraction Step. Comments and

information extracted in previous steps can be

accessed and copied to a text editor added in the

tool. After collecting that information, the

researcher can transfer this initial version of the

summary to a more powerful text editor.

 Generation of charts that support a quantitative

SR characterization. For example: the percentage

of studies identified by each search engine, the

percentage of studies accepted, rejected and

duplicated in Extraction step, the times that each

inclusion and exclusion criterion was used for

classifying the studies as accepted or rejected

(Figure 11). In fact, this kind of quantitative data

is particularly relevant for Systematic Mappings

(Petersen et al, 2008). In case the researcher

choose to do meta-analysis, carry out statistical

tests or elaborate other charts, StArt can generate,

among other reports, a spreadsheet that allows

data manipulation outside the tool. These reports

can be generated according to researchers’ needs,

based on options that allow grouping data in

different ways, (Figure 5-A), applying different

filters (Figure 5-B) and choosing specific

characteristics of the studies (Figure 5-C). Figure

5-D shows a preview of the report.

 Deal with a large volume of data to discover

features, patterns and hidden trends through

visualization. When an SR or SM process is

finished, there is a large amount of data related to

the research topic that can show trends in the

evolution of the topic over time, which is

interesting information to explain the state of the

art. As mentioned before, the information

visualization is a helpful tool for knowledge

integration activities.

3 VISUALIZATION IN StArt

Considering the importance of quantitative data for

both the SR and SM and the fact that information

visualization explores the natural visual ability of

humans aiming to facilitate information processing

(Gershon, Eick, Card, 1998), StArt uses

visualization to facilitate knowledge management

about literature reviews. Using effective visual

interfaces, it is possible to quickly manipulate large

volumes of data to discover characteristics, patterns

and hidden trends.

Based on visualization, for example, it is easier

to realize how a specific research topic evolved over

time. See Figure 6 where the researcher’s interest

was to understand how the topic “traceability” was

explored by the academic community, in relation to

the question investigated in this example. It is easy

to identify that in 2005 and 2006 there was only one

study published; in 2007 and 2008 there were few

additional studies, but in 2009, 2010 and 2011, the

number of studies that mentioned the research topic

was more significant than in the previous years.

To build this visualization, the researcher should

select the following options (Figure 6): green

rectangle representing an accepted study; part of the

study title nearby the rectangle, the publication year

as the grouping filter, and the Radial Graph as the

visualization technique.

Now, suppose that the researcher would like to

identify appropriated places for submitting a study

or for publishing results of a literature review. In this

ManagingLiteratureReviewsInformationthroughVisualization

Figure 3: Search engines defined in the protocol are automatically inserted under Studies Identification.

Figure 4: Relationship between attributes defined in the protocol and the form available during the Extraction step.

Figure 5: Options for specifying reports.

case he/she should select almost the same options

mentioned before, exchanging year by place. This

visualization (Figure 7), allows identifying the main

discussion forums for the topic under investigation.

Observe that some places have few studies related to

“traceability”, while some others have have more

publications on this topic. Besides, the visualization

type was Radial Graph and the studies titles were

omitted.

If the researcher wishes to merge both the

previous analysis in one graph, it would be better to

use a different visualization type. In this case the

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

Tree technique seems better, as shown in the

screenshot of Figure 8. The researcher can expand

the levels according to their need.

A double click on a selected study shows

information (like authors, abstracts, etc) about it.

In addition to the features described above,

visualization is also used to show the relationship

among the studies recovered in literature review.

This information allows evaluating the set of studies

and enhancing the search for them. This resource is

better explained in next section.

4 TEXT MINING IN StArt

According to Dunne et al. (2012), the growing

number of publications combined with increasingly

Figure 6: Visualization of publications on “traceability” over time.

Figure 7: Visualization of places where “traceability” has been published.

ManagingLiteratureReviewsInformationthroughVisualization

Figure 8: Visualization of places where “traceability” has been published according to the year.

cross-disciplinary sources makes it challenging to

follow emerging research topics and identify key

studies. It is even harder to begin exploring a new

field without a starting set of references.

During the conduction of literature reviews many

studies are retrieved from various search engines

through search strings. Hence, the researcher must

be careful not to leave out any studies that may be

relevant. According to Boell and Cezec-Kecmanovic

(2010), the usual problem of systematic reviews is

that the more inclusive the search strategy, the more

irrelevant studies will be retrieved; the more precise

and specific the search strategy, the more relevant

studies will be missed.

In order to help minimizing this problem, StArt

provides support to identify the references of each

study retrieved by the search strings. This support

allows knowing if there are studies not retrieved, but

referenced.

As the search engines generally do not provide

the list of references from each study, this

information is obtained by reading and extracting the

references of the PDF files of the retrieved studies.

Every time a PDF file is linked to a study, StArt

searches the references in the PDF file. Aiming to

identify information like authors, publication place

and title, regular expressions are used to identify the

bibliographic reference template that was used

(APA, Harvard, IEEE, etc.). To determine which

study is related to another one, the similarity

between the titles of the studies is calculated using

the text mining algorithm proposed by Salton

(1989). The result of this process is shown through

visualization as presented in Figure 9. The study in

the centre of the figure was not retrieved in the

literature review, but is referenced by five studies

that were retrieved.

This functionality is especially useful during the

execution of pilot literature reviews, which should

be conducted for adjusting the protocol and the

search strings, as suggested by Kitchenham (2007).

If there are studies not found but referenced many

times, the researcher should verify, for example, if

the keywords of these studies should be considered

in the protocol and search strings. If so, a new search

applying these new keywords must be performed

aiming to find relevant studies that were missed.

Start also offers the functionality for detecting

which of the studies imported into the tool are

similar. The similarity is calculated based on the

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

abstracts through Vector Processing Model (Salton,

Allan, 1994). The result of this processing is shown

in a table as presented in Figure 10. This table

provides a list of similar studies and their respective

similarity grade in relation to a study previously

selected

This list of similar studies can be used, for

example: (i) to define the next study to be analyzed;

(ii) to facilitate comparison between similar studies

and (iii) to make the inclusion and exclusion of

studies easier – studies with a high level of similarity

to an excluded study tend to be also excluded.

Other researches use text mining in the context

of SR or SM, but it is not available in tools that

support the whole SR or SM processes.

Malheiros et al. (2007) proposed the use of a

visualization tool, named PEx, to support the first

step of studies selection. PEx has a module that

processes the abstract of the primary studies,

eliminates stopwords, calculates the terms frequency

and, based on this result, displays clusters of studies

to facilitate their analysis.

Felizardo et al. (2011) continued the research

cited above presenting the VTM (Visual Text

Mining) tool which supports studies selection. Like

Malheiros et al. (2007), the result of text mining

processes is shown by different visualization

techniques which help applying the inclusion and

exclusion criteria previously inserted in VTM tool.

It is important to notice that the focus of these

studies is the studies selection step. On the other

hand, in Start, visualization and text mining are

currently being used to support the search string

definition and the SR or SM Summarization phase.

5 CONCLUSIONS AND FUTURE

WORK

This paper explored the use of visualization for

making easier the interpretation of data provided by

Systematic Literature Review and Systematic

Mapping. This visualization is available in StArt,

which also supports the steps of SR and SM

processes. As these processes are laborious, posses

Figure 9: Relationship among the studies uploaded into StArt and their references.

ManagingLiteratureReviewsInformationthroughVisualization

Figure 10: List of similar studies in relation to a selected study – the similarity grade is highlighted.

many repetitive steps and require that all information

is packed, the availability of computational support

is relevant.

Although there are some tools that have been

used by researches to aid the conduction of literature

reviews, most of them are reference manager. Some

examples are JabRef (jabref.sourceforge.net),

EndNote (www.endnote.com), ProCite

(www.procite.com), Reference Manager

(www.refman.com), RefWorks (www.refworks.com)

and Zotero (www.zotero.org). Only SLR tool

(Fernández-Sáez, Genero, Romero, 2010) focuses on

SR process (Kitchenham, 2007). However, it works

only on the English or Spanish versions of the

Windows operating system.

As StArt is closely associated to the SR and SM

processes, it provides many facilities that make

easier the conduction of these types of reviews.

Some characteristics that differentiate it from the

other tools are the score, which is calculated

automatically and can give insights on the paper

relevance; different types of data visualization that

can aid to map the research area; extraction of the

references of the studies gathered in the review, that

allows evaluating the adequacy of search strings and

improving the quality of the whole activity; and

other facilities that make the conduction of the

process more manageable.

Considering the importance of packing the SRs

or SMs data, StArt saves all data in a “.start” file

which allows conducting a review in sessions and

sharing a review with another researcher. In

addition, as StArt provides a simple text editor for

writing an initial summary of the state of the art, this

summary is also packed. StArt is being continuously

evolved and tested. The tool was also evaluated from

the perspective of its usefulness and ease of use,

according to the TAM model, which found that the

tool is useful to users and can be easily used by

researchers (Hernandes et al, 2010).

As future work, it is planned to continue the

development of StArt emphasizing the analysis

related to Systematic Mappings. This objective has

already initiated with the addition of visualization,

but there are other features that can enhance its

support for SM. Besides, it is planned some

experimental studies that aim to establish a strategy

to improve search strings based on the references of

ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems

the collected studies and also to explore the tool as a

support to conduct meta reviews.

ACKNOWLEDGEMENTS

The authors thank the students and researchers who

have been used StArt and are giving constant

feedback to development team and CNPq, CAPES

and Observatório da Educação Project for financial

support.

REFERENCES

Boell, S. K., Cezec-Kecmanovic, D. 2010. Are systematic

reviews better, less biased and of higher quality?. In

Proc. European Conference on Information Systems,

Helsinki, Finland.

Burley, D. 2010. Information visualization as a knowledge

integration tool. Journal of Knowledge Management

Practice.

Dunne, C., Shneiderman, B., Gove, R., Klavans, J. &

Dorr, B. 2012. Rapid understanding of scientific paper

collections: integrating statistics, text analytics, and

visualization. JASIST: Journal of the American Society

for Information Science and Technology.

Felizardo, K.R. et al. Using Visual Text Mining to Support

the Study Selection Activity in Systematic Literature

Reviews”. In: Int. Symposium on Empirical Software

Engineering and Measurement, ESEM, 2011, pp. 77-

86.

Fernández-Sáez, A.M., Genero, M., Romero, F.P. SLR-

Tool: A Tool for Performing Systematic Literature

Reviews. In Proc. JISBD, 2010, pp.329-332.

Gershon, N., Eick, S.G., Card, S. 1998. Information

visualization interactions, ACM Interactions, ACM

Press.

Hernandes, E. C. M.; Zamboni, A. B.; Thommazo, A. D.;

Fabbri, S. C. P. F. Avaliação da ferramenta StArt

utilizando o modelo TAM e o paradigma GQM. In: X

Experimental Software Engineering Latin American

Workshop, ICMC-São Carlos.

Kitchenham, B. A. 2004. Procedures for Performing

Systematic Reviews. Software Engineering Group,

Keele University, Keele, Tech. Rep. TR/SE 0401.

Kitchenham, B. A. 2007. Guidelines for performing

Systematic Literature Reviews in Software. Software

Engineering Group, Keele Univ., Keele, Univ.

Durham, Durham, Tech. Rep. EBSE-2007-01.

Malheiros, V., Höhn, E., Pinho, R., Mendonça, M.,

Maldonado, J.C. “A Visual Text Mining approach for

Systematic Reviews”. In: International Symposium on

Empirical Software Engineering and Measurement,

ESEM, 2007, pp. 245-254.

Montebelo, R. P. et. al. 2007. SRAT (Systematic Review

Automatic Tool) Uma Ferramenta Computacional de

Apoio à Revisão Sistemática, In V Experimental

Software Engineering Latin American Workshop.

ICMC-São Carlos.

Pai, M., McCulloch, M., Gorman, J. D., Pai, N., Enanoria,

W., Kennedy, G., Tharyan, P., Colford Jr., J. M. 2004.

Clinical Research Methods - Systematic reviews and

meta-analyses: An illustrated, step-by-step guide. The

National Medical Journal of India.

Petersen, K. et al. 2008. Systematic Mapping Studies in

Software Engineering, In: Proc. Inter. Conf. on

Evaluation and Assessment in Software Engineering,

Bari, Italy.

Salton, G. 1989. Automatic Text Processing - The

Transformation, Analysis and Retrieval of Information

by Computer. Addison-Wesley.

Salton, G., Allan, J. Text Retrieval Using the Vector

Processing Model. In Symposium on Document

Analysis and Information Retrieval. University of

Nevada, Las Vegas, 1994.

Zamboni, A. B.; Thommazo, A. D.; Hernandes, E. C. M.;

Fabbri, S. C. P. F. StArt Uma Ferramenta

Computacional de Apoio à Revisão Sistemática. In:

Brazilian Conference on Software: Theory and

Practice - Tools session. UFBA.

ManagingLiteratureReviewsInformationthroughVisualization