Authors:
            
                    Sebastian Lindner
                    
                        
                    
                     and
                
                    Winfried Höhn
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    University of Würzburg, Germany
                
        
        
        
        
        
             Keyword(s):
            References Parsing, Bibliography, Conditional Random Fields (CRFs), Constraint-based Learning, Information Extraction, Information Retrieval, Machine Learning, Sequence Labeling, Semi-supervised Learning.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Artificial Intelligence
                    ; 
                        Clustering and Classification Methods
                    ; 
                        Computational Intelligence
                    ; 
                        Data Reduction and Quality Assessment
                    ; 
                        Evolutionary Computing
                    ; 
                        Information Extraction
                    ; 
                        Knowledge Discovery and Information Retrieval
                    ; 
                        Knowledge-Based Systems
                    ; 
                        Machine Learning
                    ; 
                        Soft Computing
                    ; 
                        Symbolic Systems
                    
            
        
        
            
                Abstract: 
                This paper shows some key components of our workflow to cope with bibliographic information. We therefore compare several approaches for parsing bibliographic references using conditional random fields (CRFs). This paper concentrates on cases, where there are only few labeled training instances available. To get better labeling results prior knowledge about the bibliography domain is used in training CRFs using different constraint models. We show that our labeling approach is able to achieve comparable and even better results than other state of the art approaches. Afterwards we point out how for about half of our reference strings a correlation between journal title, volume and publishing year could be used to identify the correct journal even when we had ambiguous journal title abbreviations.