DYNAMIC ANALYSIS OF MALWARE USING DECISION TREES

Ravinder R. Ravula, Chien-Chung Chan, Kathy J. Liszka

2011

Abstract

Detecting new and unknown malware is a major challenge in today¹s software security profession. Most existing works for malware detection are based on static features of malware. In this work, we applied a reversed engineering process to extract static and behavioural features from malware. Two data sets are created based on reversed features and API Call features. Essential features are identified by applying Weka’s J48 decision tree classifier to 582 malware and 521 benign software samples collected from the Internet. The performance of decision tree and Naïve Bayes classifiers are evaluated by 5-fold cross validation with 80-20 splits of training sets. Experimental results show that Naïve Bayes classifier has better performance on the smaller data set with 12 reversed features, while J48 has better performance on the data set created from the API Call data set with 141 features.

References

  1. Ahmed, F., Hameed, H., Shafiq, M. Z. and Farooq, M., 2009. Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In AISec 7809: Proceedings of the 2nd ACMworkshop on Security and artificial intelligence, pages 55-62, New York, NY, USA, 2009. ACM.
  2. Burji, S., Liszka, K. J., and Chan, C.-C., 2010. Malware Analysis Using Reverse Engineering and Data Mining Tools. The 2010 International Conference on System Science and Engineering (ICSSE 2010), July 2010, pp. 619-624.
  3. Chan, C.-C. and Santhosh, S., 2003. BLEM2: Leaming Bayes' rules from examples using rough sets. Proc. NAFIPS 2003, 22nd Int. Conf. of the North American Fuzzy Information Processing Society, July 24 - 26, 2003, Chicago, Illinois, pp. 187-190.
  4. Christodorescu, M., Jha, S. and Kruegel, C., 2007. Mining specifications of malicious behaviour. Proc. ESEC/FS 2007, pp. 5-14.
  5. Cohen, F., 1985. Computer Viruses. PhD thesis, University of Southern California.
  6. Cohen, W., 1996. Learning Trees and Rules with SetValued Features. American Association for Artificial Intelligence (AMI), 1996.
  7. Islam, R., Tian, R., Batten, L. and Versteeg, S.C., 2010. Classification of Malware Based on String and Function Feature Selection. 2010 Second Cybercrime and Trustworthy Computing Workshop, Ballarat, Victoria Australia., July 19-July 20, ISBN: 978-0- 7695-4186-0.
  8. Kang, M. G., Poosankam, P. and Yin, H., 2007. Renovo: A hidden code extractor for packed executables. In Proc. Fifth ACM Workshop on Recurring Malcode (WORM 2007), November 2007.
  9. Kolter, J. and Maloof, M., 2004. Learning to detect malicious executables in the wild. Proc. KDD-2004, pp. 470-478.
  10. Komashinskiy, D. and Kotenko, I. V., 2010. Malware Detection by Data Mining Techniques Based on Positionally Dependent Features. PDP 7810 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing., IEEE Computer Society Washington, DC, USA ©2010. ISBN: 978-0-7695-3939-3
  11. Mcafee.com, 2010a. Retrieved from: http://www.mcafee. com/us/resources/reports/rp-quarterly-threat-q3- 2010.pdf
  12. Mcafee.com, 2010b. Retrieved from: http://www. mcafee.com/ us/ resources/reports/rp-good-decade-forcybercrime.pdf
  13. Messagelabs.com, 2011. Retrieved from: http://www. messagelabs.com/mlireport/MLI_2011_01_January_Fi nal_en-us.pdf
  14. Miller, P., 2000. Hexdump. Online publication, 2000 http://www.pcug.org.au/ millerp/hexdump.html
  15. Rozinov, K., 2005. Reverse Code Engineering: An InDepth Analysis of the Bagle Virus. Information Assurance Workshop, 2005. IAW 7805. Proceedings from the Sixth Annual IEEE SMC, 15-17 June 2005, pp. 380 - 387.
  16. Schultz, M. G., Eskin, E., Zadok, E. and Stolfo, S. J., 2001. Data Mining Methods for Detection of New Malicious Executables. In Proceedings of the 2001 IEEE Symposium on Security and Privacy, IEEE Computer Society, 2001, pp. 38-49.
  17. Skoudis, E., 2004. Malware: Fighting Malicious Code. Prentice Hall.
  18. Sung, A., Xu, J., Chavez, P., Mukkamala, S., 2004. Static analyzer of vicious executables (save). Proc. 20th Annu. Comput. Security Appl. Conf., 2004, pp. 326- 334.
  19. Wang, T.-Y., Wu, C.-H. and Hsieh, C.-C., 2008. A Virus Prevention Model Based on Static Analysis and Data Mining Methods. CITWORKSHOPS 7808, Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops, pp. 288 - 293.
  20. Wang, T.-Y., Wu, C.-H. and Hsieh, C.-C., 2009. Detecting Unknown Malicious Executables Using Portable Executable Headers. Fifth International Joint Conference on INC, IMS and IDC, pp.278-284, 2009.
  21. Witten, I. H. and Frank, E., 2005. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition. ISBN: 0-12-088407-0.
Download


Paper Citation


in Harvard Style

R. Ravula R., Chan C. and J. Liszka K. (2011). DYNAMIC ANALYSIS OF MALWARE USING DECISION TREES . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 74-83. DOI: 10.5220/0003660200740083


in Bibtex Style

@conference{kdir11,
author={Ravinder R. Ravula and Chien-Chung Chan and Kathy J. Liszka},
title={DYNAMIC ANALYSIS OF MALWARE USING DECISION TREES},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={74-83},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003660200740083},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - DYNAMIC ANALYSIS OF MALWARE USING DECISION TREES
SN - 978-989-8425-79-9
AU - R. Ravula R.
AU - Chan C.
AU - J. Liszka K.
PY - 2011
SP - 74
EP - 83
DO - 10.5220/0003660200740083