6  CONCLUSION  
This  paper  presents  a  framework  for  analysing  the 
sentiments in the Arabic language that is related to the 
automobile field. The paper also explained the details 
of  the  project  phases,  data  collection,  annotation 
procedures,  how  the  data  is  cleaned,  the  feature 
selection process, the way of splitting the data, how 
the data is graphically depicted and the classification 
process  with  the  results  of  the  twenty-two machine 
learning classifiers adopted in this contribution. The 
highest obtained result for accuracy is 83.79% by the 
Ensemble  Hard  Vote  classifier.  Hence,  the  results 
reflect that the Ensemble Hard Vote classifier should 
be  adopted  to  analyse  the  sentiment  in  Arabic 
automobile datasets due to its high results in the four 
measured scales. 
In future work, more experiments and studies will 
be conducted on how to enhance the accuracy results 
through  improving  the  cleaning  process,  including 
dictionary  as  a  hybrid  approach  and  adopting 
advanced deep learning algorithms. 
REFERENCES 
Liu,  B.,  2012.  Sentiment Analysis and Opinion Mining, 
Morgan & Claypool Publishers.  
Zaidan,  O.,  Callison-Burch,  C.,  2014.  Arabic  Dialect 
Identification.  Computational Linguistics Journal. 
40(1). 
Prabowo, R., Thelwall, M., 2009. Sentiment  Analysis: A 
Combined Approach. Journal of Informetrics. 3(2). 
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, 
A., 2015. Feature Selection for High-Dimensional Data 
(Artificial Intelligence: Foundations, Theory, and 
Algorithms). Springer. 1st Edition, 2015 Edition. 
Dingli, A., 2011. Knowledge Annotation: Making Implicit 
Knowledge Explicit. Springer. 2011 Edition. 
Nichols, T., Wisner, P., Gulabchand, L., Cripe, G., 2010. 
Putting  the  Kappa  Statistic  to  Use.  The Quality 
Assurance Journal. 13(3-4). 
Manolescu,  I.,  Weis,  M.,  2007.    Declarative  XML  Data 
Cleaning with  XClean.  In CAiSE 2007, International 
Conference on Advanced Information Systems 
Engineering. 
Squire, M., 2015. Clean Data. Packt Publishing.
 
Kaur, G., 2014. Usage of Regular Expressions in NLP. In 
IJRET,  International Journal of Research in 
Engineering and Technology. 3(4).  
Raulji, J., Saini, J., 2016. Stop-Word Removal Algorithm 
and  its  Implementation  for  Sanskrit  Language. 
International Journal of Computer Applications. 
150(2). 
Jivani,  A.,  2011.  A  Comparative  Study  of  Stemming 
Algorithms.  International Journal of Computer 
Technology and Applications. 2(6).  
Kanan,  T.,  Fox,  E.,  2016.  Automated  Arabic  text 
classification with P-Stemmer, machine learning, and a 
tailored  news  article  taxonomy.  Journal of the 
Association for Information Science and Technology. 
67(11). 
Elkhoury,  R.,  Taghva,  K.,  J.,  Coombs,  2005.  Arabic 
Stemming  without  a  Root  Dictionary.  In ITCC’05, 
International Conference on Information Technology: 
Coding and Computing. Vol. 2. 
Mohod, S., Dhote, C., 2014. Feature Selection Technique 
for  Text  Document  Classification:  An  Alternative 
Approach.  International Journal on Recent and 
Innovation Trends in Computing and Communication. 
2(9). 
Gusev, I., Indenbom, E., Anastasyev, D., 2018. Improving 
Part-of-Speech Tagging  via  Multi-Task  Learning and 
Character-Level  Word  Representations.  In  Dialogue 
2018, Computational Linguistics and Intellectual 
Technologies: Proceedings of the International 
Conference. (17). 
Deshmukh, S., Shinde, G., 2016. Sentiment TFIDF Feature 
Selection  Approach  for  Sentiment  Analysis. 
International Journal of Innovative Research in 
Computer and Communication Engineering. 4(7).  
Ojeda,  T.,  Bilbro,  R.,  Bengfort,  B.,  2018.  Applied Text 
Analysis with Python: Enabling Language-Aware Data 
Products with Machine Learning. O'Reilly Media. 1st 
Edition. 
Ghosh,  S.,  Desarkar,  M.,  2018.  Class  Specific  TF-IDF 
Boosting  for  Short-text  Classification:  Application  to 
Short-texts  Generated  During  Disasters.  In  IW3C2, 
International World Wide Web Conference Committee.  
TfidfVectorizer. [cited 1-10-2019]; Available from: https:// 
scikit  learn.org/stable/modules/generated/sklearn.featu 
re_extraction.text.TfidfVectorizer.html  
Kshirsagar,  V.,  Awachate,  B.,  2016.  Improved  Twitter 
Sentiment  Analysis  Using  NGram  Feature  Selection 
and Combinations. In IJARCCE, International Journal 
of Advanced Research in Computer and 
Communication Engineering. 5(9). 
Bhayani, R., Huang, L., A., Go., 2009. Twitter Sentiment 
Classification using Distant Supervision.    Stanford 
Digital Library Technologies Project. 
Allison, B., Guthrie, D., Guthrie, L., 2006. Another Look at 
the Data Sparsity Problem. International Conference on 
Text, Speech and Dialogue
. 
Narayanan,  V.,  Arora,  I.,  Bhatia,  A.,  2013.  Fast  and 
Accurate Sentiment Classification Using an Enhanced 
Naïve  Bayes  Model.  In  IDEAL,  International 
Conference on Intelligent Data Engineering and 
Automated Learning.  
Cocea, M., Liu, H., 2017. Semi-random partitioning of data 
into training and test sets in granular computing context. 
Granular Computing Journal. 2(4).  
Bazazeh,  D.,  Shubair,  R.,  2016.  Comparative  Study  of 
Machine  Learning  Algorithms  for  Breast  Cancer 
Detection  and  Diagnosis.  In  ICEDSA,  The 5th 
International Conference on Electronic Devices, 
Systems and Applications.