A Part based Modeling Approach for Invoice Parsing

Enes Aslan, Tugrul Karakaya, Ethem Unver, Yusuf Sinan AKGUL


Automated invoice processing and information extraction has attracted remarkable interest from business and academic circles. Invoice processing is a very critical and costly operation for participation banks because credit authorization process must be linked with real trade activity via invoices. The classical invoice processing systems first assign the invoices to an invoice class but any error in document class decision will cause the invoice parsing to be invalid. This paper proposes a new invoice class free parsing method that uses a two-phase structure. The first phase uses individual invoice part detectors and the second phase employs an efficient part-based modeling approach. At the first phase, we employ different methods such as SVM, maximum entropy and HOG to produce candidates for the various types of invoice parts. At the second phase, the basic idea is to parse an invoice by parts arranged in a deformable composition similar to face or human body detection from digital images. The main advantage of the part-based modeling (PBM) approach is that this system can handle any type of invoice, a crucial functionality for business processes at participation banks. The proposed system is tested with real invoices and experimental results confirm the effectiveness of the proposed approach.


