Bringing Order to Legal Documents - An Issue-based Recommendation System Via Cluster Association

Qiang Lu, Jack G. Conrad

2012

Abstract

The task of recommending content to professionals (such as attorneys or brokers) differs greatly from the task of recommending news to casual readers. A casual reader may be satisfied with a couple of good recommendations, whereas an attorney will demand precise and comprehensive recommendations from various content sources when conducting legal research. Legal documents are intrinsically complex and multi-topical, contain carefully crafted, professional, domain specific language, and possess a broad and unevenly distributed coverage of issues. Consequently, a high quality content recommendation system for legal documents requires the ability to detect significant topics from a document and recommend high quality content accordingly. Moreover, a litigation attorney preparing for a case needs to be thoroughly familiar the principal arguments associated with various supporting opinions, but also with the secondary and tertiary arguments as well. This paper introduces an issue-based content recommendation system with a built-in topic detection/segmentation algorithm for the legal domain. The system leverages existing legal document metadata such as topical classifications, document citations, and click stream data from user behavior databases, to produce an accurate topic detection algorithm. It then links each individual topic to a comprehensive pre-defined topic (cluster) repository via an association process. A cluster labeling algorithm is designed and applied to provide a precise, meaningful label for each of the clusters in the repository, where each cluster is also populated with member documents from across different content types. This system has been applied successfully to very large collections of legal documents, O(100M), which include judicial opinions, statutes, regulations, court briefs, and analytical documents. Extensive evaluations were conducted to determine the efficiency and effectiveness of the algorithms in topic detection, cluster association, and cluster labeling. Subsequent evaluations conducted by legal domain experts have demonstrated that the quality of the resulting recommendations across different content types is close to those created by human experts.

References

  1. Adomavicius, G. and Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6):734-749.
  2. Aggarwal, C. C. and Yu, P. S. (2006). A framework for clustering massive text and categorical data streams. In Proceedings of the Sixth SIAM International Conference on Data Mining (SDM 2006).
  3. Al-Kofahi, K. and et al. (2007). A document recommendation system blending retrieval and categorization technologies. In Proceedings of AAAI Workshop on Recommender Systems in e-Commerce, pages 9-16.
  4. Al-Kofahi, K., Tyrrell, A., Vachher, A., Travers, T., and Jackson, P. (2001). Combining multiple classifiers for text categorization. In Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM01), pages 97-104. ACM Press.
  5. Beeferman, D., Berger, A. L., and Lafferty, J. D. (1997). A model of lexical attraction and repulsion. In Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics (ACL97), pages 373-380.
  6. Bennett, J. and Lanning, S. (2007). The netflix prize. In Proceedings of KDD Cup and Workshop.
  7. Blei, D. M., Ng, J. A., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-1022.
  8. Bun, K. K. and Ishizuka, M. (2002). Topic extraction from news archive using tf*pdf algorithm. In Proceedings of the Third International Conference on Web Information Systems Engineering (WISE02), pages 73-82.
  9. Chen, K.-Y., Luesukprasert, L., and cho Timothy Chou, S. (2007). Hot topic extraction based on timeline analysis and multidimensional sentence modeling. IEEE Transactions on Knowledge and Data Engineering, 19(8):1016-1025.
  10. Choi, F. Y. (2000). Advances in domain independent linear text segmentation. In Proceedings of the Applied Natural Language Processing Conference (ANLP00), pages 26-33.
  11. Choi, F. Y., Wiemer-Hastings, P. M., and Moore, J. (2001). Latent semantic analysis for text segmentation. In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP01), pages 109-117.
  12. Cohen, M. L. and Olsen, K. C. (2007). Legal Research in a Nutshell. Thomson West, Saint Paul, MN, 9th edition.
  13. Fukumoto, F. and Suzuki, Y. (2011). Cluster labeling based on concepts in a machine-readable dictionary. In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 1371-1375. AFNLP.
  14. Glover, E. J., Kostas, T., Lawrence, S., Pennock, D. M., and Flake, G. W. (2002a). Using web structure for classifying and describing web pages. In Proc. of the World Wide Web, pages 562-569. ACM Press.
  15. Glover, E. J., Pennock, D. M., Lawrence, S., and Krovetz, R. (2002b). Inferring hierarchical descriptions. In Proceedings of the 11th International Conference on Information and Knowledge Management (CIKM02), pages 507-514. ACM Press.
  16. Hearst, M. (1997). Texttiling: Segmenting text into multiparagraph subtopic passages. Computational Linguistics, 23:33-64.
  17. Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proc. of 22nd Annual International SIGIR Conference, pages 50-57. ACM Press.
  18. Jain, A., Narasimha, M., and Flynn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3):264- 332.
  19. Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401:788-791.
  20. Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceeding of World Wide Web Conference (WWW10), pages 661-671.
  21. Linden, G., Smith, B., and York, J. (2003). Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76-80.
  22. Liu, J., Dolan, P., and Pedersen, E. R. (2010). Personalized news recommendation based on click behavior. In Proceedings of the 15th International Conference on Intelligent User Interfaces (IUI10), pages 31-40.
  23. Lu, Q., Conrad, J. G., Al-Kofahi, K., and Keenan, W. (2011). Legal document clustering with built-in topic segmentation. In Proceedings of the 20th International Conference on Information and Knowledge Management (CIKM11), pages 383-392. ACM Press.
  24. Malik, H. H., Kender, J. R., Fradkin, D., and Mrchen, F. (2010). Hierarchical document clustering using local patterns. Journal of Data Mining Knowledge Discovery, 21(1):53-185.
  25. Popescul, A. and Ungar, L. H. (2000). Automatic labeling of document clusters. Unpublished MS Thesis.
  26. Prasad, S., Melville, P., Banerjee, A., and Sindhwani, V. (2011). Emerging topic detection using dictionary learning. In Proceedings of the 20th International Conference on Information and Knowledge Management (CIKM11), pages 745-754. ACM Press.
  27. Schilder, F. and Kondadadi, R. (2008). Fastsum: Fast and accurate query-based multi-document summarization. In Proceedings of the 46th Association for Computational Linguistics (ACL08), pages 205-208.
  28. Stein, B. and zu Eissen, S. M. (2004). Topic identification: Framework and application. In Proceedings of the 4th International Conference on Knowledge Management (KNOW04), pages 353-360.
  29. Treeratpituk, P. and Callan, J. (2006). Automatically labeling hierarchical clusters. In Proceedings of the 2006 International Conference on Digital Government Research (DG.O 06), pages 167-176.
  30. Utiyama, M. and Isahara, H. (2001). A statistical model for domain-independent text segmentation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL01), pages 499-506.
Download


Paper Citation


in Harvard Style

Lu Q. and Conrad J. (2012). Bringing Order to Legal Documents - An Issue-based Recommendation System Via Cluster Association . In Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012) ISBN 978-989-8565-30-3, pages 76-88. DOI: 10.5220/0004136600760088


in Bibtex Style

@conference{keod12,
author={Qiang Lu and Jack G. Conrad},
title={Bringing Order to Legal Documents - An Issue-based Recommendation System Via Cluster Association},
booktitle={Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012)},
year={2012},
pages={76-88},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004136600760088},
isbn={978-989-8565-30-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Engineering and Ontology Development - Volume 1: KEOD, (IC3K 2012)
TI - Bringing Order to Legal Documents - An Issue-based Recommendation System Via Cluster Association
SN - 978-989-8565-30-3
AU - Lu Q.
AU - Conrad J.
PY - 2012
SP - 76
EP - 88
DO - 10.5220/0004136600760088