A Part based Modeling Approach for Invoice Parsing

Enes Aslan, Tugrul Karakaya, Ethem Unver, Yusuf Sinan AKGUL

Abstract

Automated invoice processing and information extraction has attracted remarkable interest from business and academic circles. Invoice processing is a very critical and costly operation for participation banks because credit authorization process must be linked with real trade activity via invoices. The classical invoice processing systems first assign the invoices to an invoice class but any error in document class decision will cause the invoice parsing to be invalid. This paper proposes a new invoice class free parsing method that uses a two-phase structure. The first phase uses individual invoice part detectors and the second phase employs an efficient part-based modeling approach. At the first phase, we employ different methods such as SVM, maximum entropy and HOG to produce candidates for the various types of invoice parts. At the second phase, the basic idea is to parse an invoice by parts arranged in a deformable composition similar to face or human body detection from digital images. The main advantage of the part-based modeling (PBM) approach is that this system can handle any type of invoice, a crucial functionality for business processes at participation banks. The proposed system is tested with real invoices and experimental results confirm the effectiveness of the proposed approach.

References

  1. L. Hardy, "The Evolution of Participation Banking in Turkey." Al Nakhlah Online Journal of Soutwest Asia and Islamic Civilization (2012).
  2. F. Khan, "How 'Islamic'is Islamic banking?" Journal of Economic Behavior & Organization 76.3 (2010), pp.805-820.
  3. "Payment of Supplier's Due Amounts." Kuwait Finance House. Kuwait Finance House. Web. 26 Jan. 2015. <http://www.kfh.com/en/commercial/murabahaa/pay ment-of-suppliers-due-amounts.aspx>.
  4. E. Sorio, A. Bartoli, G. Davanzo, & E. Medvet, (2010, September). Open world classification of printed invoices. In Proceedings of the 10th ACM symposium on Document engineering (pp. 187-190). ACM.
  5. H. Hamza, Y. Belaïd, & A. Belaïd, (2007). Case-based reasoning for invoice analysis and recognition. In CaseBased Reasoning Research and Development (pp. 404- 418). Springer Berlin Heidelberg.
  6. B. Klein, A. R. Dengel, & A. Fordan, (2004). smartFIX: An adaptive system for document analysis and understanding. In Reading and Learning (pp. 166-186). Springer Berlin Heidelberg.
  7. M. A. Fischler, & R. A. Elschlager, The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1) (1973), 67-92.
  8. P. F. Felzenszwalb, D. P. Huttenlocher. "Pictorial structures for object recognition." International Journal of Computer Vision 61.1 (2005): 55-79.
  9. B. Forcher, S. Agne, A. Dengel, M. Gillmann, & T. RothBerghofer, "Towards understandable explanations for document analysis systems." Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on. IEEE, 2012.
  10. B. Klein, S. Agne, and A. Dengel. "Results of a study on invoice-reading systems in Germany." Document Analysis Systems VI. Springer Berlin Heidelberg, 2004. 451-462.
  11. F. Cesarini, E. Francesconi, M. Gori, S. Marinai, J. Q. Sheng, G. Soda, "Rectangle labelling for an invoice understanding system." Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on. Vol. 1. IEEE, 1997.
  12. G. Salton, & M. J. McGill, (1983). Introduction to modern information retrieval.
  13. A. L. Berger, V. J. D. Pietra, and S. A. D. Pietra. "A maximum entropy approach to natural language processing." Computational linguistics 22.1 (1996): 39- 71.
  14. S. L. Tanimoto, "Template matching in pyramids." Computer Graphics and Image Processing 16, no. 4 (1981): 356-369.
  15. N. Dalal, B. Triggs, "Histograms of oriented gradients for human detection." In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886-893. IEEE, 2005.
  16. O. Comay, "Form data extraction without customization." U.S. Patent No. 8,660,294. 25 Feb. 2014.
  17. U. S. Unal, E. Unver, T. Karakaya, Y. S. Akgul, “Invoice Content Table Detection and Analysis with Feature Fusion”, Signal Processing and Communications Applications 2015.
  18. E. Aslan, T. Karakaya, E. Unver, Y. S. Akgul, “An Optimization Approach For Invoice Image Analysis”, Signal Processing and Communications Applications 2015.
Download


Paper Citation


in Harvard Style

Aslan E., Karakaya T., Unver E. and AKGUL Y. (2016). A Part based Modeling Approach for Invoice Parsing . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 390-397. DOI: 10.5220/0005777803900397


in Bibtex Style

@conference{visapp16,
author={Enes Aslan and Tugrul Karakaya and Ethem Unver and Yusuf Sinan AKGUL},
title={A Part based Modeling Approach for Invoice Parsing},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={390-397},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005777803900397},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2016)
TI - A Part based Modeling Approach for Invoice Parsing
SN - 978-989-758-175-5
AU - Aslan E.
AU - Karakaya T.
AU - Unver E.
AU - AKGUL Y.
PY - 2016
SP - 390
EP - 397
DO - 10.5220/0005777803900397