Authors:
            
                    Taeka Awazu
                    
                        
                    
                    ; 
                
                    Manami Fukuo
                    
                        
                    
                    ; 
                
                    Masami Takata
                    
                        
                    
                     and
                
                    Kazuki Joe
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    Nara Women's University, Japan
                
        
        
        
        
        
             Keyword(s):
            Character Recognition, Character Clipping, Genetic Programing, Early-modern Japanese Printed Books.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Applications
                    ; 
                        Character Recognition
                    ; 
                        Classification
                    ; 
                        Evolutionary Computation
                    ; 
                        Pattern Recognition
                    ; 
                        Software Engineering
                    ; 
                        Theory and Methods
                    
            
        
        
            
                Abstract: 
                The web site of National Diet Library in Japan provides a lot of early-modern (AD1868-1945) Japanese printed books to the public, but full-text search is essentially impossible. In order to perform advanced search for historical literatures, the automatic textualization of the images is required. However, the ruby system, which is peculiar to Japanese books, gives a serious obstacle against the textualization. When we apply existing OCRs to early-modern Japanese printed books, the recognition rate is extremely low. To solve this problem, we have already proposed a multi-font Kanji character recognition method using the PDC feature and an SVM. In this paper, we propose a ruby character removal method for early-modern Japanese printed books using genetic programming, and evaluate our multi-fonts Kanji character recognition method with 1,000 types of early-modern Japanese printed Kanji characters.