The AIS Project: Boosting Information Extraction from Legal Documents by using Ontologies

María G. Buey, Angel Luis Garrido, Carlos Bobed, Sergio Ilarri

2016

Abstract

In the legal field, it is a fact that a large number of documents are processed every day by management companies with the purpose of extracting data that they consider most relevant in order to be stored in their own databases. Despite technological advances, in many organizations, the task of examining these usually-extensive documents for extracting just a few essential data is still performed manually by people, which is expensive, time-consuming, and subject to human errors. Moreover, legal documents usually follow several conventions in both structure and use of language, which, while not completely formal, can be exploited to boost information extraction. In this work, we present an approach to obtain relevant information out from these legal documents based on the use of ontologies to capture and take advantage of such structure and language conventions. We have implemented our approach in a framework that allows to address different types of documents with minimal effort. Within this framework, we have also regarded one frequent problem that is found in this kind of documentation: the presence of overlapping elements, such as stamps or signatures, which greatly hinders the extraction work over scanned documents. Experimental results show promising results, showing the feasibility of our approach.

References

  1. Aguado de Cea, G., Puch, J., and Ramos, J. (2008). Tagging Spanish texts: The problem of 'se'. In 6th International Conference on Language Resources and Evaluation (LREC'08), pages 2321-2324.
  2. Appelt, D. E., Hobbs, J. R., Bear, J., Israel, D., and Tyson, M. (1993). Fastus: A finite-state processor for information extraction from real-world text. In 13th International Joint Conferences on Artificial Intelligence (IJCAI'93), volume 93, pages 1172-1178.
  3. Biagioli, C., Francesconi, E., Passerini, A., Montemagni, S., and Soria, C. (2005). Automatic semantics extraction in law documents. In 10th International Conference on Artificial Intelligence and Law (ICAIL'05), pages 133-140.
  4. Borobia, J. R., Bobed, C., Garrido, A. L., and Mena, E. (2014). SIWAM: Using social data to semantically assess the difficulties in mountain activities. In 10th International Conference on Web Information Systems and Technologies (WEBIST'14), pages 41-48.
  5. Carrasco, R. and Gelbukh, A. (2003). Evaluation of TnT Tagger for Spanish. In 4th Mexican International Conference on Computer Science (ENC'03), pages 18-25.
  6. Cheng, T. T., Cua, J. L., Tan, M. D., Yao, K. G., and Roxas, R. E. (2009). Information extraction from legal documents. In 8th International Symposium on Natural Language Processing (SNLP'09), pages 157-162.
  7. Child, B. (1992). Drafting legal documents: Principles and practices. West Academic.
  8. Embley, D. W. and Zitzelberger, A. (2010). Theoretical foundations for enabling a web of knowledge. In 6th International Symposium on Foundations of Information and Knowledge Systems (FoIKS'10), pages 211- 229.
  9. Garrido, A. L., Buey, M. G., Ilarri, S., and Mena, E. (2013). GEO-NASS: A semantic tagging experience from geographical data on the media. In 17th East-European Conference on Advances in Databases and Information Systems (ADBIS'13), pages 56-69.
  10. Garrido, A. L., G ómez, O., Ilarri, S., and Mena, E. (2012). An experience developing a semantic annotation system in a media group. In 17th International Conference on Applications of Natural Language Processing to Information Systems (NLDB'12), pages 333-338.
  11. Gruber, T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2):199-220.
  12. Jackson, P., Al-Kofahi, K., Tyrrell, A., and Vachher, A. (2003). Information extraction from case law and retrieval of prior cases. Artificial Intelligence, 150(1):239-290.
  13. Kara, S., Alan, O., Sabuncu, O., Akpinar, S., Cicekli, N. K., and Alpaslan, F. N. (2012). An ontology-based retrieval system using semantic indexing. Information Systems, 37(4):294-305.
  14. Moens, M.-F., Uyttendaele, C., and Dumortier, J. (1999). Information extraction from legal texts: the potential of discourse analysis. International Journal of Human-Computer Studies, 51(6):1155-1171.
  15. Nadeau, D. and Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3-26.
  16. Russell, S. and Norvig, P. (1995). Artificial intelligence: a modern approach. Prentice-Hall Series in Artificial Intelligence.
  17. Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases, 1(3):261-377.
  18. Wimalasuriya, D. C. and Dou, D. (2010). Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science, 36(3):306-323.
Download


Paper Citation


in Harvard Style

Buey M., Garrido A., Bobed C. and Ilarri S. (2016). The AIS Project: Boosting Information Extraction from Legal Documents by using Ontologies . In Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-172-4, pages 438-445. DOI: 10.5220/0005757204380445


in Bibtex Style

@conference{icaart16,
author={María G. Buey and Angel Luis Garrido and Carlos Bobed and Sergio Ilarri},
title={The AIS Project: Boosting Information Extraction from Legal Documents by using Ontologies},
booktitle={Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2016},
pages={438-445},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005757204380445},
isbn={978-989-758-172-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - The AIS Project: Boosting Information Extraction from Legal Documents by using Ontologies
SN - 978-989-758-172-4
AU - Buey M.
AU - Garrido A.
AU - Bobed C.
AU - Ilarri S.
PY - 2016
SP - 438
EP - 445
DO - 10.5220/0005757204380445