
 
3.4  Interpreting 
We  propose  the  use  of  various  classification 
techniques  that  easily  interpretable  models  like 
Linear  Regression,  Multilayer  Perceptron,  Random 
Forest,  IBK  and  K-star.  Using  WEKA  as  open 
software  machine  learning  can  provides  several 
features of selection models. Finally these algorithms 
will be executed, validated, evaluated and compared 
the results in order to determine which one give the 
best result with high accuracy. 
3.4.1  Linear Regression 
Linear regression is the best predication model to test 
the  cause  of  one  dependent  variable  (final  grade) 
effect on one or more independent variables (features 
in online discussion forum). The initial judgement of 
a  possible  relationship  between  two  continuous 
variables should always be made  on the basis of a 
scatter  plot  (scatter  graph).(Schneider  et  al  2010). 
Moreover,  linear  regression  approach  is  quite  easy 
and faster processing for large size datasets. The time 
to  build  this algorithm is  0.05  seconds.  Below  the 
result shows the formula of linear regression: 
 
Grade =   0.4779 * attendance + 0.4614 * 
posts + 26.2719 
3.4.2  Multilayer Perceptron 
Multilayer  perceptron  is  a  supervised  learning 
algorithm that uses the concept of neural network that 
interact using weighted connections. Each node will 
have a weight which, then, multiply the input node 
that  generate  the  output  predication.  The  Weight 
measure  the  degree  of  correlation  between  activity 
levels of neuron of which they connect. (Pal & Mitra, 
1992). Generally, result from multilayer perceptron 
more  accurate  than  linear  regression  but  require  a 
longer processing time for large datasets because the 
algorithm  will  always  update  the  weight  for  each 
instance of the data. Thus, considering such factor, 
the disadvantage of Multilayer perceptron is sensitive 
to feature scaling (Pedregosa, 2011). 
There  are  three  hidden  node  labelled  sigmoid 
node 1, 2 and 3. Attribute Posting, ATT and FOD 
seem to have nearly the same weight and sign in all 
the  neurons.  Below  show  the  result  of  multilayer 
perceptron with the time taken to build 0.11 seconds: 
 
Sigmoid Node 1 
    Inputs    Weights 
    Threshold    -0.3389339469178622 
    Attrib Posting    0.6356339310638692 
    Attrib Login    -1.971194964716918 
    Attrib Forum    0.1528793652016145 
    Attrib ATT    -2.9824012894200167 
    Attrib FOD     -1.2565096616525258 
Sigmoid Node 2 
    Inputs    Weights 
    Threshold    -0.3319752049637097 
    Attrib Posting    0.8489632795859472 
    Attrib Login    0.8981808286647163 
    Attrib Forum    1.1775792813836161 
    Attrib ATT    -0.2727426863562934 
    Attrib FOD     -1.4842188659857705 
Sigmoid Node 3 
    Inputs    Weights 
    Threshold    -1.4238193757464874 
    Attrib Posting    2.516298013366708 
    Attrib Login    0.7532046884360826 
    Attrib Forum    -0.15476793041226244 
    Attrib ATT    -0.010654173314826458 
    Attrib FOD     2.257937779725289 
3.4.3  Random Forest 
The random forest was founded by Breiman in 2001 
(Breiman  2001),  as  implemented  in  WEKA,  is  an 
ensemble  of  unpruned  classification  trees  that  use 
majority voting to perform prediction. The Random 
forest  combines  the  predictions  from  classification 
trees  using  an  algorithm  similar  to  C4.5  (J48  in 
Weka). (Khoshgoftaar 2007).  
3.4.4  IBK (K-Nearest Neighbour) 
IBK  is  a  k-nearest-neighbour  classifier.  It  is  also 
known as ‘lazy learning’ technique for the classifier 
construction  process  needs  only  little  effort  and, 
mostly, the work is performed along with the process 
of  classification.(Khoshgoftaar  2007).  Various 
combinations of search algorithms can be utilized to 
ease  the  task  of  finding  the  nearest  neighbours. 
Normally, linear search is the most commonly used 
but there are other options which are also potential 
including  KD-trees,  ball  trees  and  cover  trees”. 
(Vijayarani & Muthulakshmi 2013) 
Predictions made by considering more than one 
neighbour  can  be  weighted  based  on  the  distance 
from  the  test  instance  and,  then,  the  distance  is 
converted  into  the  weight  by  implemented  two 
different  formulas.  (Vijayarani  &  Muthulakshmi 
2013). 
3.4.5  K-Star 
K*  algorithm  is  an  instance-based  learner  using 
entropy  to  quantify  the  distance.  It  is  considerably 
Student Performance Prediction using Online Behavior Discussion Forum with Data Mining Techniques
93