uAIS: An Experience of Increasing Performance of NLP Information

Extraction Tasks from Legal Documents in an Electronic Document

Management System

Marcos Ruiz

, Cristian Rom

Angel Luis Garrido

2 a

and Eduardo Mena

2 b

InSynergy Consulting S.A., Madrid, Spain

SID Research Group, IIS Department, University of Zaragoza, Zaragoza, Spain

Keywords:

Optimization, Performance, Electronic Document Management System, Message Broker, NLP, Information

Extraction.

Abstract:

Nowadays, the huge number of documents which are managed through document management systems make

their manual processing practically impossible. That is why the use of natural language processing subsys-

tems that help to perform certain tasks begins to be essential for many commercial systems. Although its use

is gradually extending to all levels, this type of subsystems presents the problem of its high requirements of

resources from CPU and memory that can harm the entire system to which it intends to provide assistance. In

this work, we propose and study an architecture based on microservices and message brokers which improves

the performance of these NLP subsystems. We have implemented our approach on a real document manage-

ment system, which performs intensive processes of language analysis on large legal documents. Experimental

results show promising results, greatly increasing the productivity of systems based on other approaches.

1 INTRODUCTION

Due to the lowering of storage costs, nowadays pri-

vate individuals, and especially private and public or-

ganizations, possess huge amounts of text-based doc-

uments, which are stored on their own computers or

on the Web.

. This overﬂow due to the amount of

information precludes manual treatment, and as a re-

sult, in recent years, the need of automated tools to

analyze and process all this big amount of informa-

tion has been more noticeable than ever. These types

of tools fall within the ﬁeld of Natural Language Pro-

cessing (NLP), the scientiﬁc study of how to program

computers to process and analyze large amounts of

natural language. Challenges in NLP include speech

recognition, natural language understanding, and nat-

ural language generation (Khurana et al., 2017).

Regarding the task of extracting information from

text documents, when they performed by people spe-

cialized, it is a high time-consuming activity that ad-

ditionally requires a high degree of expertise in the

ﬁeld treated by these documents.

https://orcid.org/0000-0001-5600-0008

https://orcid.org/0000-0002-7462-0080

http://www.internetlivestats.com/

In fact, the greatest difﬁculty is the knowledge of the

work context and the rules that lead to locate what

are the speciﬁc data to be extracted. In addition, the

task can also be especially expensive in time if the

documents are large and the status of the document

is not optimal, as sometimes happens with documents

scanned using an optical recognition system (OCR).

If we circumscribe to the information extraction task

regarding to legal documents (laws, contracts, mort-

gages, agreements, etc.) we can observe that the lan-

guage used exhibits a very speciﬁc vocabulary and ex-

pressions, so it is more difﬁcult to understand and pro-

cess. It leads to the need for a very specialized type

of staff for carrying out this type of work. Moreover,

these types of documents can be very long and tedious

to read, which further complicates this type of work

for a human.

Despite efforts to perform automated information

extraction tasks that have been made for many years,

there are still no commercial products specialized in

these tasks, even more for very speciﬁc ﬁelds of ap-

plication (science, history, or law, to name a few).

Language is also another important limitation that re-

quires speciﬁcity in developments. Therefore, the

software that is developed with this aim is often a

combination of existing resources with custom pro-

Ruiz, M., Román, C., Garrido, Á. and Mena, E.

uAIS: An Experience of Increasing Performance of NLP Information Extraction Tasks from Legal Documents in an Electronic Document Management System.

DOI: 10.5220/0009421201890196

In Proceedings of the 22nd International Conference on Enterprise Information Systems (ICEIS 2020) - Volume 1, pages 189-196

ISBN: 978-989-758-423-7

189

gramming. In addition, these types of applications,

being the extraction of information a very complex

task, need many software resources: databases to

store linguistic resources, large thesauri, complex li-

braries, or Internet services. That also leads to re-

quiring many resources at the hardware level: CPU,

memory, disk space, cache, bandwidth, etc.

In recent years, the application of automatic infor-

mation extraction technologies to address these tasks

has eased their realization in administrative and busi-

ness ﬁelds. The lack of commercial products has led

to the implementation of tailor-made solutions, which

often prioritize effectiveness over efﬁciency.

Further, if we focus on the speciﬁc case of auto-

matic processing of legal documents, we ﬁnd addi-

tional difﬁculties such as the speciﬁcity of the vocab-

ulary, the typology of the documents, the particulari-

ties of the languages in each region, and the concrete

classiﬁcation needs in each context. As a result, it

is very complex to create a system capable of cor-

rectly solving particularly speciﬁc tasks and also for

any use case (van Noortwijk, 2017). Moreover, this

kind of documents contains, in many cases, a con-

siderable amount of overlapping elements like stamps

and signatures, which further hinders its automated

treatment. These elements are usually required to

prove the authenticity of the document, so they are

hardly avoidable.

The purpose of this work is to study the optimiza-

tion technologies for boosting NLP systems, specif-

ically information extraction systems applied to le-

gal documentation, with the ultimate goal of optimiz-

ing the performance of this kind of processes over

that context. For this, we have used the technology

of message brokers. Message brokers are elements

in telecommunication or computer networks where

software applications communicate by exchanging

formally-deﬁned messages (Magnoni, 2015).

To carry out this investigation in a rigorous way,

on the one hand, we have worked with a real im-

plementation of a system able of extracting informa-

tion from legal documents using semantic technolo-

gies good results. On the other hand, given the spe-

ciﬁc nature of legal documents, real data sets will be

required, an aspect that is usually complicated, since

in many cases they are difﬁcult to access private texts

for the realization of experiments. To overcome this

difﬁculty, this work has been carried out jointly by the

research team of Insynergy (ISYC)

, a well-known

Spanish company dedicated to technology and inno-

vation. The company has among some of its docu-

ment management products a tool called AIS which

processes large amounts of legal documents in order

https://www.isyc.com/

to extract information from them (Buey et al., 2016).

Thanks to the participation of the company in this

research work, an important set of these legal doc-

uments have been used to perform the experiments,

both of the training and of the classiﬁcations. De-

spite the fact that the experimental dataset is com-

posed of Spanish legal documents, our approximation

is generic enough to be applied to any type of legal

documents and regardless of language.

This paper is structured as follows. Section 2 an-

alyzes and describes the state of the art. Section 3 de-

scribes the methodology proposed for the automatic

classiﬁcation process. Section 4 show and discusses

the preliminary results of our experiments with real

legal documents. Finally, Section 5 explains our con-

clusions and future work.

2 RELATED WORK

Since the widespread implementation of Electronic

Document Management Systems (EDMS), like de-

scribed at (Chen et al., 1999), their performance has

been analyzed from different points of view (Iacob

and Jonkers, 2006; Burtylev et al., 2013). If we focus

on the problem at the level of system performance that

represents the in-depth treatment of extensive text-

based documents, it is presently too in many scien-

tiﬁc papers. Systems like the one described in (Sri-

hari and Shi, 2004) already showed the difﬁculties

to optimize the performance of systems dedicated to

the interpretation of documents. Some authors fo-

cus their study on the optimization of the processes

of this type of systems (access, download, delete,

etc.) (Cheng et al., 2013), but they do not speciﬁ-

cally address the problem of NLP subsystems. Other

works base their optimization on analyzing the sys-

tem in depth: In (Kiedrowicz et al., 2016), the au-

thors analyze a document management system repre-

senting them through the use of graph models. An-

other option is the use of Business Process Manage-

ment (BPM) tools to model the processes and thus de-

tect the bottlenecks, to later provide these parts of the

system with more resources (Djedovi

c et al., 2016).

These authors make a good approximation to the anal-

ysis part of the problem, but it remains to be speciﬁed

how to make that optimization.

In regard to the use of NLP in legal document

management systems, it can be seen that its use

has been increasing in recent years. For example,

in (Hachey and Grover, 2006) the authors integrate an

automatic summarization tool into a document sys-

tem for storing judgments of the UK House of Lords.

Another system dedicated to manage notary docu-

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

190

ments is described in (Amato et al., 2008), which

transforms them into RDF statements for suitable in-

dexing, retrieval and long term preservation. Eu-

nomos (Boella et al., 2016) is a legal document and

knowledge management system, based on legislative

XML and ontologies, which uses NLP tools to semi-

automate lower-skill tasks. In all of them it is clear

the high computational cost that this type of process

entails, but its optimization is pending. There are also

interesting approaches that study how to serialize or

parallelize NLP services (Garrido et al., 2014; McE-

wan et al., 2016), but they are not directly related to

EDMS nor belong to the legal ﬁeld.

Regarding message brokers, the ﬁrst references of

architectures based on the use of services accessible

through messages can be found in the works carried

out in the 90s, such as (Oki et al., 1994). These, in

turn, are inspired by previous works focused on the

design of architectures for distributed systems (Ba-

navar et al., 1999). In these works a message bro-

ker is deﬁned as an intermediary software that trans-

lates a message from the formal messaging protocol

of the sender to the formal messaging protocol of the

receiver. In the architecture with traditional broker all

the messages go through a static component that will

always be separated from the different groups of ap-

plications that want to communicate with each other:

producers and consumers. This architecture is simple,

light, and easy to adapt in other contexts, since they

only have to know how to connect to the broker. The

broker might be distributed on different machines or

not.

The main objective of message queuing systems is

to offer a reliable and simple message exchange sys-

tem for the application developer. For this, different

techniques are used as such: 1) disk usage through

databases or the operating system itself, 2) replication

of messages through various brokers, 3) replication of

brokers to ensure high availability, 4) use of transac-

tions, and 5) use of conﬁrmation messages, both in

publication and consumption.

Nowadays, message brokers have many

uses (Yongguo et al., 2019), as managing work-

load queues or messages for multiple receivers,

routing messages to one or more destination, trans-

forming messages to an alternative representation,

invoking services to retrieve data, etc. Even so,

although message brokers are applied in multiple

scenarios, as far as we have known, its use has

not been analyzed yet how it works on an EDMS

in combination with natural language processing

subsystems.

3 METHODOLOGY

The objective is to optimize the performance of an

embedded NLP subsystem within an EDMS. The

large computational requirements of the NLP mod-

ule in charge of processing documents contribute to

making the EDMS slower and therefore more inefﬁ-

cient. The proposed solution to improve system per-

formance and scalability is described below.

3.1 Message Oriented Middleware

The main idea is to convert the NLP subsystem into an

external service that can be replicated in several nodes

so that they can solve different requests in parallel in-

stead of sequentially. For this, the design approach

is to use an asynchronous messaging service. There

is a wide range of solutions of this nature that allow

parallelizing and distributing different computational

works. The use of a Message Oriented Middleware

(MOM) has been chosen, a type of architecture that

offers the following functionalities:

• Sending and receiving messages in a non-

blocking and asynchronous manner.

• Message persistence through disk usage (database

or operating system included).

• Replication of messages through various brokers

to prevent the loss of messages.

• Replication of brokers to ensure high service

availability.

• Use of transactions to ensure that all messages are

processed.

• Use of conﬁrmation messages, both in publication

and in consumption, to ensure that requests are

sent and received correctly.

It has been decided to incorporate a message bro-

ker like MOM which is responsible for managing the

exchange of messages between producers and con-

sumers. The broker can also be distributed on differ-

ent machines. The communication process is depicted

in Figure 1.

In this architecture we ﬁnd on the one hand a

group of consumers who are going to demand infor-

mation (EDMS nodes), and some workers (NLP mod-

ule) who are going to provide it. For this, both groups

of programs subscribe to the Broker to synchronize

with it, so that when a consumer issues a request, that

request is put in a queue and ﬁnally sent to a worker to

solve it. The worker conﬁrms receipt of the request to

the Broker, processes it, and sends the answer again to

the Broker. This response is put in a queue and ﬁnally

sent by the Broker to the consumer who has made it,

which ﬁnally conﬁrms its receipt to the Broker.

uAIS: An Experience of Increasing Performance of NLP Information Extraction Tasks from Legal Documents in an Electronic Document

Management System

191

Figure 1: Proposed architecture using a message broker:

the consumers are different nodes of the EDMS, and the

workers are services that incorporate NLP functionality.

3.2 Design

The design of the proposed architecture is described

in greater detail below:

1. EDMS: Responsible for receiving document pro-

cessing requests. Depending on the number of

users to attend, it is assumed that several nodes

are deployed to offer a service that satisﬁes cus-

tomers. In the case of having more than one node,

the clients access the different modules through a

load balancer. In addition, it is also assumed that

each of these nodes needs to access a ﬁle system

where the documents are located, and a database

to update data on said documents.

2. Microservice: It will contain the NLP function-

ality and is responsible for processing the doc-

uments. Their number can be increased so that

service availability can always be guaranteed and

adapted to the number of customer requests.

3. Message broker: The message broker is the com-

ponent that allows communication between the

nodes of the EDMS and any microservice. It is

a critical component in this system, since there is

only one and it also has a status. If this compo-

nent fails, there may be lost messages and users

will not get a response to their requests. To avoid

this type of fault, several solutions are proposed:

• Persistence through saving messages to disk.

• Replication of messages through the master-

slave pattern.

• Use of transactions.

• Replication or distribution of message brokers.

4 EVALUATION

To meet the objectives, a solution based on the pro-

posed architecture explained in Section 3 has been

developed. Next, the use case used to evaluate the

proposed architecture will be discussed.

4.1 Context and Dataset

To evaluate the proposed architecture, we have

worked on the Oncostumer system, a CRM (Cus-

tomer Relationship Management), that belongs to In-

synergy (ISYC)

. Oncostumer is an EDMS in charge

of managing the legal documentation of its clients. It

is necessary to extract data from the documents, and

for doing this, a subsystem focused on the NLP tasks

is used. OnCostumer uses the ﬁle system to save cus-

tomer documents and a database to store customer in-

formation and document data.

The tests have been carried out with a certain type

of legal document, speciﬁcally with sales contracts.

These types of contracts, which may have a high num-

ber of pages, contain information on the property to

be sold, the seller, and the buyer, as well as speciﬁc

data on the same transaction. These documents are

relatively long: between 75 and 100 pages each. The

task of NLP that is performed on these documents

is a laborious extraction of information that requires

a series of highly resource-consuming actions: spell

checking, correction by using N-grams, lemmatiza-

tion, parsing, application of rules, ontology queries

and Named Entity recognition (NER) tasks. All this

to be able to extract from each document 55 data em-

bedded in the text, among which are the names, sur-

names and identifying data of the people involved,

the information about the property (such as its size in

square meters thereof), and the transaction data (such

as the price of the transaction).

4.2 Baseline System

To carry out the complex tasks of information extrac-

tion, the AIS system is used footnoteAIS is the Spanish

acronym for Semantic Analysis and Interpretation. (Buey

et al., 2016), a system created from the joint work of

ISYC and the University of Zaragoza. This system

has as input legal documents scanned with an OCR

(Optical Character Recognition), and its output is the

relevant information automatically extracted using ar-

tiﬁcial intelligence techniques. The AIS system per-

forms three major tasks: a preprocess, a central ex-

traction task, and a postprocess (Garrido and Peir

2018; Buey et al., 2019).

https://www.isyc.com/

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

192

This system is embedded within the OnCustomer

platform, which, as mentioned above, is a complex

CRM solution. AIS is implemented as one of the

many resources available for this platform. The hard-

ware on which it works is a high performance ma-

chine to service all customers who use this solution.

OnCostumer can be cloned into several nodes to sup-

port clients in parallel. The AIS system consists of a

process that is launched periodically. Check if there

are documents in the input directory, process them,

and deposit the results in several output directories,

depending on the status (correct or erroneous). On-

costumer collects the system results from these direc-

tories and performs the following document process-

ing tasks based on them.

The main problems of this architecture are:

• The work done by the library is sequential, so

the maximum capacity of the machine cannot be

used.

• The scaling is only vertical, so the capabilities of

AIS only can be expanded by improving the ma-

chine itself.

• In case of having the CRM in several nodes, this

library should be included in each of them, mak-

ing these nodes even heavier than they already are.

• Preprocessing, extraction and postprocessing can-

not be used separately.

• The task is executed from time to time, i.e., tasks

are not starting immediately and calls are being

made to the ﬁlesystem unnecessarily.

4.3 Optimized System

The new system is an implementation of the

microservice-based architecture proposed in Sec-

tion 3, and its main objective is to decouple the AIS li-

brary from the rest of the CRM. Each of the microser-

vices responsible for NLP processing will be called

uAIS. Now the communication between the Docu-

ment Management System and the NLP subsystem

is based on the competing-consumer’s pattern. This

pattern works as follows:

1. Firstly, OnCostumer leaves in the IN folder the

legal document from which is wanted to extract

the data.

2. Secondly, one of the nodes of the CRM sends a

message to the message broker, also indicating the

queue to use.

3. The message broker decides to which node uAIS

is sending the request through the Round-Robin

method. This implies that if we have N uAIS ser-

vices waiting for messages. The messages will be

distributed equally as long as there are no uAIS

busy with previous requests.

4. Each of the uAIS nodes are responsible for re-

ceiving the messages, leaving the XML ﬁle in the

OUT folder and sending the reply message to the

message broker. If uAIS suffers a shutdown, it

will not be able to send the response message,

so after a reasonable period of time the message

broker will forward the request to the next active

uAIS node or will wait for one.

5. Once one of the uAIS services has processed a re-

quest and deposited the response XML ﬁle in the

shared ﬁle system, it sends a message indicating

the location of the output XML ﬁle and if it has

been processed correctly. These response mes-

sages are sent through the Round-Robin method

to the nodes of the CRM to copy the XML ﬁle to

their memory area and update the database to no-

tify the user that the process has been completed.

4.4 Implementation

In the Figure 2 a scheme of the implementation and

operation of the new system is shown. On the one

hand, there are the users that interact with each of

the nodes of the CRM through a load balancer and

these in turn share the data through two databases.

On the other hand, there is the message broker that

is responsible for communicating the different nodes

of the CRM with the different AIS microservices that

ensure high service availability. And ﬁnally there is

the shared ﬁle system whereby the CRM nodes and

the AIS microservices exchange the documents to be

processed and the response XML ﬁles processed.

For the implementation of the new system, the

use of a RabbitMQ broker

has been chosen since

it meets each and every one of the following require-

ments:

• Permissive license, very advantageous for its great

compatibility with other types of license, and even

compatible with copyleft licenses.

• Guaranteed long-term support and compatibility

with software of a certain age. Very interesting

also because not always is possible to work with

the latest technology, on the contrary, it is very

common to have legacy systems.

• Guarantees in delivery of shipment to total pro-

cessing of the request.

• Disk persistence, which provides greater secu-

rity against loss of information in case of system

crashes.

https://www.rabbitmq.com/

uAIS: An Experience of Increasing Performance of NLP Information Extraction Tasks from Legal Documents in an Electronic Document

Management System

193

MessageBroker

Queue:ais.requests

Queue:ais.responses

Database:

Postgres

Round-Robin

uAIS

Direct

exchange:

amq.direct

Direct

exchange:

amq.direct

CRM

Node1

CRM

Node3

CRM

Node2

Figure 2: Architecture of the proposed communication system. On the left are the CRM instances that make requests to the

uAIS services located on the right side. The management of these requests is done through the Message Broker.

• Monitoring with graphical interface.

• Integration with Spring and Maven repository.

• Authentication and authorization systems.

• Sufﬁcient performance for our requirements.

• Ability to use the pub / sub pattern (publisher /

subscriber), consumers competitors and RPC (Re-

mote Procedure Call).

• Compatibility with numerous standard protocols

such as AMQP, STOMP and MQTT.

• Great community of users.

• A very low RAM usage compared to ActiveMQ.

• Use of virtual hosts to be able to use the same

message broker in different environments at the

same time.

The implementation of the uAIS microservice itself

was made with the latest stable version of the Springs

Framework since it is the same framework used by the

CRM, which will simplify its maintenance. This mi-

croservice will integrate the previous version of AIS

as a library, but adapted to its new service format.

One of the objectives of the uAIS development is to

make it as simple as possible since it will only be in

charge of communicating with the messaging broker

and conﬁguring and starting the AIS library.

4.5 Results

To evaluate the performance of both systems, the

CPU, memory, and performance consumption of both

the base system and the new system are monitored

ﬁrst. To do the tests, eight uAIS microservices have

been activated waiting for messages.

As can be seen in Figure 3, the difference in the

CPU usage of the CRM without AIS, compared to the

old system is very signiﬁcant. It has gone from prac-

tically not using the CPU (since the CRM is waiting

for the results and only works when they are already

available), to take full advantage of it.

With regard to memory use, in Figure 4 it can be

seen how RAM consumption is similar in both sys-

tems. This is because, on the one hand, the CRM has

a minimum of very high RAM in use due to its com-

plexity. This causes that the differences between both

systems are not seen so clearly. On the other hand,

the memory consumption of the CRM using AIS as a

service has ups and downs during the whole process,

each peak being the collection of an AIS result.

Regarding performance, Figure 5 shows an im-

provement of 77%, from taking almost 450 seconds to

less than 100 seconds in processing 200 documents.

0 100 200 300 400

0 20 40 60 80 100

Time in seconds

% CPU

●

CRM + uAIS

Only CRM (uAIS)

CRM + AIS as lib

Figure 3: Comparison of the processor usage of the baseline

system versus the new system.

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

194

0 100 200 300 400

0 500 1000 1500 2000

Time in seconds

RAM in MB

●

CRM + uAIS

Only CRM (uAIS)

CRM + AIS as lib

Figure 4: Comparison of the memory usage of the baseline

system versus the new system.

5 CONCLUSIONS AND FUTURE

WORK

Notarial acts, sales documents, judicial acts, con-

tracts, etc., are types of legal documents widely used,

but there are not too many specialized tools for pro-

cessing them. Moreover, the automatic information

extraction of this type of documents is still done man-

ually or with low performance automatic tools based

on monolithic solutions.

In this work, we have proposed an approach based

of message brokers to optimize this type of systems.

Due to the difﬁculty in ﬁnding suitable datasets in the

ﬁeld of legal documents, the development of this work

and the experimental tests have been carried out in

collaboration with a company dedicated to the pro-

cessing of this type of documents.

The main contributions of this work is the design

of an architecture based on microservices and mes-

sage brokers which improves the performance of NLP

subsystems, which require many resources from CPU

and memory. Our approach offers scalable perfor-

mance by allowing asynchronous communication be-

tween machines. As these are subsystems that can be

isolated from the information storage modules (such

as databases), their conversion to a queue system of-

fers great advantages without excessive complexities

in the developing: 1) allows the maximum capacity

to be used. the machine where it is housed, 2) can

perform a horizontal scaling at the level of work lots,

3) It makes the main document management systems

lighter, and 4) it facilitates maintenance and testing.

In addition, a series of automatic tests have been

carried out with a real system for extracting informa-

Figure 5: Comparison of the performance of the baseline

system versus the new system.

tion from legal documents, measuring response times,

the use of RAM and the use of the CPU. Regarding

the results, the system has achieved an improvement

of up to 77% in performance.

There are two main lines of development for fu-

ture work:

1. Making NLP microservices multifunctional and

having independent conﬁgurations, which would

give the system additional ﬂexibility. This conﬁg-

uration would be embedded in the messages them-

selves.

2. Subdividing the NLP modules into several mi-

croservices, which would make them even lighter

and easier to invoke and maintain. In this way,

we could add more nodes in the most demanding

parts in CPU, thus improving even more the re-

sponse times of the system.

ACKNOWLEDGEMENTS

This research work has been supported by the OTRI

Projects ICIX6 2018/0577 and ICIX7 2019/0628 at

the University of Zaragoza, and the CICYT TIN2016-

78011-C4-3-R (AEI/FEDER, UE).

REFERENCES

Amato, F., Mazzeo, A., Penta, A., and Picariello, A. (2008).

Using nlp and ontologies for notary document man-

agement systems. In 2008 19th International Work-

shop on Database and Expert Systems Applications,

pages 67–71. IEEE.

uAIS: An Experience of Increasing Performance of NLP Information Extraction Tasks from Legal Documents in an Electronic Document

Management System

195

Banavar, G., Chandra, T., Strom, R., and Sturman, D.

(1999). A case for message oriented middleware. In

International Symposium on Distributed Computing,

pages 1–17. Springer.

Boella, G., Di Caro, L., Humphreys, L., Robaldo, L., Rossi,

P., and van der Torre, L. (2016). Eunomos, a legal

document and knowledge management system for the

web to provide relevant, reliable and up-to-date infor-

mation on the law. Artiﬁcial Intelligence and Law,

24(3):245–283.

Buey, M. G., Garrido, A. L., Bobed, C., and Ilarri, S.

(2016). The AIS project: Boosting information ex-

traction from legal documents by using ontologies. In

Proceedings of the 8th International Conference on

Agents and Artiﬁcial Intelligence, pages 438–445. IN-

STICC, SciTePress.

Buey, M. G., Roman, C., Garrido, A. L., Bobed, C., and

Mena, E. (2019). Automatic legal document analysis:

Improving the results of information extraction pro-

cesses using an ontology. In Intelligent Methods and

Big Data in Industrial Applications, pages 333–351.

Springer.

Burtylev, I., Mokhun, K., Bodnya, Y., and Yukhnevich, D.

(2013). Development of electronic document manage-

ment systems: Advantage and efﬁciency. Science and

Technology, 3(2A):1–9.

Chen, Y.-J. J., Ferguson, D. R., Hong, A. N., Suleman, D.,

and Whittemore, G. L. (1999). Computer-based doc-

ument management system. US Patent 6,009,442.

Cheng, W. Z., Yang, Y., Zhang, L., and Li, L. (2013). Op-

timization for web-based online document manage-

ment. In Advanced Materials Research, volume 756,

pages 1135–1140. Trans Tech Publ.

Djedovi

c, A.,

Zuni

c, E., Ali

c, D., Omanovi

c, S., and

Karabegovi

c, A. (2016). Optimization of the business

processes via automatic integration with the document

management system. In 2016 International Confer-

ence on Smart Systems and Technologies (SST), pages

117–122. IEEE.

Garrido, A. L., Buey, M. G., Escudero, S., Peiro, A., Ilarri,

S., and Mena, E. (2014). The Genie Project - a se-

mantic pipeline for automatic document categorisa-

tion. In Proceedings of the 10th International Confer-

ence on Web Information Systems and Technologies,

pages 161–171. INSTICC, SciTePress.

Garrido, A. L. and Peir

o, A. (2018). Recovering damaged

documents to improve information retrieval processes.

Journal of Integrated OMICS, 8(3):53–55.

Hachey, B. and Grover, C. (2006). Extractive summari-

sation of legal texts. Artiﬁcial Intelligence and Law,

14(4):305–345.

Iacob, M.-E. and Jonkers, H. (2006). Quantitative analysis

of enterprise architectures. In Interoperability of En-

terprise Software and Applications, pages 239–252.

Springer.

Khurana, D., Koli, A., Khatter, K., and Singh, S.

(2017). Natural language processing: State of the

art, current trends and challenges. arXiv preprint

arXiv:1708.05148.

Kiedrowicz, M., Nowicki, T., Waszkowski, R., Wesołowski,

Z., and Worwa, K. (2016). Method for assessing soft-

ware reliability of the document management system

using the rﬁd technology. In MATEC Web of Confer-

ences, volume 76, page 04009. EDP Sciences.

Magnoni, L. (2015). Modern messaging for distributed

sytems. In Journal of Physics: Conference Series, vol-

ume 608, page 012038. IOP Publishing.

McEwan, R., Melton, G. B., Knoll, B. C., Wang, Y., Hult-

man, G., Dale, J. L., Meyer, T., and Pakhomov, S. V.

(2016). Nlp-pier: a scalable natural language process-

ing, indexing, and searching architecture for clinical

notes. AMIA Summits on Translational Science Pro-

ceedings, 2016:150.

Oki, B., Pﬂuegl, M., Siegel, A., and Skeen, D. (1994). The

information bus: an architecture for extensible dis-

tributed systems. In ACM SIGOPS Operating Systems

Review, volume 27, pages 58–68. ACM.

Srihari, S. N. and Shi, Z. (2004). Forensic handwrit-

ten document retrieval system. In First International

Workshop on Document Image Analysis for Libraries,

2004. Proceedings., pages 188–194. IEEE.

van Noortwijk, K. (2017). Integrated legal information re-

trieval; new developments and educational challenges.

European Journal of Law and Technology, 8(1):1–18.

Yongguo, J., Qiang, L., Changshuai, Q., Jian, S., and Qian-

qian, L. (2019). Message-oriented middleware: A re-

view. In 2019 5th International Conference on Big

Data Computing and Communications (BIGCOM),

pages 88–97. IEEE.

ICEIS 2020 - 22nd International Conference on Enterprise Information Systems

196