Authors:
Jinpeng Li
;
Christian Viard-Gaudin
and
Harold Mouchere
Affiliation:
Université de Nantes, France
Keyword(s):
Unsupervised graphical symbol learning, Graph mining, Minimum description length principle, Online handwriting.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Concept Mining
;
Context Discovery
;
Evolutionary Computing
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
Abstract:
Generally, the approaches encountered in the field of handwriting recognition require the knowledge of the symbol set, and of as many as possible ground-truthed samples, so that machine learning based approaches can be implemented. In this work, we propose the discovery of the symbol set that is used in the context of a graphical language produced by on-line handwriting. We consider the case of a two-dimensional graphical language such as mathematical expression composition, where not only left to right layouts have to be considered. Firstly, we select relevant graphemes using hierarchical clustering. Secondly, we build a relational graph between the strokes defining an handwritten expression. Thirdly, we extract the lexicon which is a set of graph substructures using the minimum description length principle. For the assessment of the extracted lexicon, a hierarchical segmentation task is introduced. From the experiments we conducted, a recall rate of 84.2% is reported on the test pa
rt of our database produced by 100 writers.
(More)