Analysis approach to the output of Semi-MultiCons 
for  this  dataset  is  shown  in  Figure  10,  where  the 
assigned cluster for each task is represented in color. 
Using  Jaccard  index  to  compare  true  classes  and 
assigned clusters  for the  474 tasks, an    accuracy of 
82%  was  calculated.  It  should  be  noted  that  these 
initial  results  were  obtained  without  tuning  the 
parameters  of  each  step  of  the  Semi-MultiCons 
approach.  In  a  second  time,  the  Semi-MultiCons 
approach  was  applied  to a  dataset  of  303 064  error 
tasks containing all error tasks raised by the Proration 
module  between January 2019  and  September  2019 
for a medium sized airline customer. Due to the size 
of dataset, only partial information was available for 
supervised  validation  of  the  results.  However, 
assuming clustering result is correct, the assessed rate 
of tasks that are similar is 39.5%. With an estimated 
average manual correction duration for tasks of more 
than  one  minute,  identifying  similar  tasks  for  their 
simultaneous  anomaly  correction  may  save  up  to 
2 000  hours  of  manual  correction  activity  for  these 
303 064 tasks.  
These  achievements  have  also  shown  the 
necessity  for  a  speciation  of  semi-supervised 
approaches  to  take  into  account  the  heterogeneous 
internal and external available information, i.e., data 
and  prior  knowledge,  in  input  and  the  application 
objectives from the perspective of the classes that are 
to  be  distinguished:  The  potential  overlapping 
properties of classes in the data space, a hierarchical 
structure  of  application  classes,  the  availability  of 
prior knowledge such as data partially annotated with 
application classes, the complex processing of logs of 
sequential correction actions requiring deep learning 
techniques, etc. Examples of recent applications with 
similar  considerations  in  the  domains  of  ontology 
matching and document classification can be found in 
(Boeva et al., 2018) and (Ippolito and Júnior, 2016). 
ACKNOWLEDGMENTS  
This  project  was  carried  out  as  part  of  the  IDEX 
UCA
JEDI
 MC2 joint project between Amadeus and the 
Université  Côte  d'Azur.  This  work  has  been 
supported  by  the  French  government,  through  the 
UCA
JEDI
 Investments in the Future project managed 
by  the  National  Research  Agency  (ANR)  with  the 
reference number ANR-15-IDEX-01. 
 
 
REFERENCES 
Agovic  A.,  Banerjee  A.  Semi-supervised  Clustering.  In 
Data Clustering: Algorithms and Applications, Chapter 
20, pp. 505-534, 2013, Chapman & Hall. 
Al-Najdi A., Pasquier N., Precioso  F.  Frequent  Closed 
Patterns  Based  Multiple  Consensus  Clustering.  In 
ICAISC'2016 International Conference on Artificial 
Intelligence and Soft Computing, pp. 14-26, June 2016, 
LNCS 9693, Springer. 
Al-Najdi A., Pasquier N., Precioso  F.  Using  Frequent 
Closed Pattern Mining to Solve a Consensus Clustering 
Problem.  In  SEKE'2016 International Conference on 
Software Engineering & Knowledge Engineering, pp. 
454-461,  July  2016,  KSI  Research  Inc.  SEKE'2016 
Third Place Award. 
Al-Najdi A., Pasquier N., Precioso F. Multiple Consensuses 
Clustering by Iterative Merging/Splitting of Clustering 
Patterns. In MLDM'2016 International Conference on 
Machine Learning and Data Mining, pp. 790-804, July 
2016, LNAI 9729, Springer. 
Al-Najdi A., Pasquier N., Precioso  F.  Using  Frequent 
Closed  Itemsets  to  Solve  the  Consensus  Clustering 
Problem.  In  International Journal of Software 
Engineering and Knowledge Engineering, 
26(10):1379-1397, December 2016, World Scientific. 
Boeva  V.,  Angelova  M.,  Lavesson  N.,  Rosander  O., 
Tsiporkova, E. Evolutionary Clustering Techniques for 
Expertise  Mining  Scenarios.  In  ICAART’2018 
International Conference on Agents and Artificial 
Intelligence, pp. 523-530, January 2018, SciTePress. 
Boongoen T., Iam-On N. Cluster Ensembles: A Survey of 
Approaches with Recent Extensions and Applications. 
In Computer Science Review, vol. 28, pp. 1-25, 2018. 
Dalton L., Ballarin V., Brun M. Clustering Algorithms: On 
Learning, Validation, Performance, and Applications to 
Genomics. In Current Genomics, 10(6):430-445, 2009, 
Bentham Science Publisher. 
Fahad A., Alshatri N., Tari Z., Alamri A., Khalil I., Zomaya 
A.,  Foufou  S.,  Bouras  A.  A  Survey  of  Clustering 
Algorithms  for  Big  Data:  Taxonomy  and  Empirical 
Analysis. In IEEE Transactions on Emerging Topics in 
Computing,  2(3):267-279,  September  2014,  IEEE 
Computer Society. 
Färber I., Günnemann S., Kriegel H.-P., Kröger P., Müller 
E.,  Schubert  E.,  Zimek  A.  On  Using  Class-Labels  in 
Evaluation  of  Clusterings.  In  KDD MultiClust 
International Workshop on Discovering, Summarizing 
and Using Multiple Clusterings, 2010. 
Ghosh J.,  Acharya A. A Survey of Consensus Clustering. 
In Handbook of Cluster Analysis, Chapter 22, pp. 497-
518, 2016, Chapman and Hall/CRC. 
Grira  I.,  Crucianu  M.,  Boujemaa  N.  Unsupervised and 
Semi-supervised Clustering. A Brief Survey.  In  A 
Review of Machine Learning Techniques for 
Processing Multimedia Content, vol. 1, pp. 9-16, 2005. 
Halkidi M., Batistakis Y., Vazirgiannis, M. On Clustering 
Validation  Techniques.  In  Journal of Intelligent 
Information Systems,  vol.  17,  pp.  107-145,  2001, 
Springer.