Investigation of AWSCTD dataset applicability for malware type classification

  • 1 Vilnius Gediminas Technical University, Vilnius, Lithuania


Nowadays, information systems security is a crucial aspect – vulnerable system endpoint can lead to severe data loss. Intrusion detection systems (IDS) are used to detect such unfortunate events. Implementation place defines the type of IDS: network-based (NIDS) for network traffic monitoring or host-based (HIDS), to detect malicious actions on the host level. IDS can be effective only if generated alerts are correctly evaluated and classified, what is typically done by a trained staff, but requires a lot of time and human resources. While a lot research is done with NIDS alerts evaluation, HIDS research is lacking behind. HIDS reported operating system calls could be used to define the importance of alarms and steer analysts to the most critical issues. In this article we demonstrate the applicability of our created Attack-Caused Windows System Calls Traces Dataset (AWSCTD), which is currently the most comprehensive dataset of system calls generated by almost all modern malware types, for training different classification methods on malware type recognition and later alert prioritization. The effectiveness of different classification methods is evaluated, and results are presented. Currently achieved results allow to decrease the load on analytical staff, dealing with malware classification and related alert prioritization by 92.4%, which makes this approach applicable for practical use.



  1. J. P. Anderson, “Computer Security Technology Planning Study,” October, vol. 2, no. 93, 1972.
  2. D. E. Denning and P. G. Neumann, “Requirements and model for IDES-a real-time intrusion detection expert system,” Document A005, SRI International, vol. 333. 1985.
  3. A. L. Buczak and E. Guven, “A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection,” IEEE Commun. Surv. TUTORIALS, vol. 18, no. 2, 2016.
  4. J. Frank, “Articial Intelligence and Intrusion Detection : Current and Future Directions,” Proc. 17th Natl. Comput. Secur. Conf., vol. 10, pp. 1–12, 1994.
  5. K. Alsubhi, E. Al-Shaer, and R. Boutaba, “Alert prioritization in Intrusion Detection Systems,” NOMS 2008 - IEEE/IFIP Netw. Oper. Manag. Symp. Pervasive Manag. Ubiquitous Networks Serv., pp. 33–40, 2008.
  6. T. H. Nguyen, J. Luo, and H. W. Njogu, “An efficient approach to reduce alerts generated by multiple IDS products,” Int. J. Netw. Manag., 2014.
  7. R. Shittu, A. Healing, R. Ghanea-Hercock, R. Bloomfield, and M. Rajarajan, “Intrusion alert prioritisation and attack detection using post-correlation analysis,” Comput. Secur., vol. 50, pp. 1–15, 2015.
  8. C. M. Chen, D. J. Guan, Y. Z. Huang, and Y. H. Ou, “Anomaly network intrusion detection using Hidden Markov Model,” Int. J. Innov. Comput. Inf. Control, vol. 12, no. 2, pp. 569– 580, 2016.
  9. D. Čeponis and N. Goranin, “Towards a Robust Method of Dataset Generation of Malicious Activity for Anomaly-Based HIDS Training and Presentation of AWSCTD Dataset,” Balt. J. Mod. Comput., vol. 6, no. 3, 2018.
  10. W. Haider, G. Creech, Y. Xie, and J. Hu, “Windows based data sets for evaluation of robustness of Host based Intrusion Detection Systems (IDS) to zero-day and stealth attacks,” Futur. Internet, vol. 8, no. 3, 2016.
  11. T. R. Glass-Vanderlan, M. D. Iannacone, M. S. Vincent, Qian, Chen, and R. A. Bridges, “A Survey of Intrusion Detection Systems Leveraging Host Data,” pp. 1–40, 2018.
  12. F. Valeur, G. Vigna, C. Kruegel, and R. A. Kemmerer, “A comprehensive approach to intrusion detection alert correlation,” IEEE Trans. Dependable Secur. Comput., vol. 1, no. 3, pp. 146– 168, 2004.
  13. R. Sadoddin and A. Ghorbani, “Alert correlation survey,” in Proceedings of the 2006 International Conference on Privacy, Security and Trust Bridge the Gap Between PST Technologies and Business Services - PST ’06, 2006, p. 1.
  14. B. Morin, L. Mé, H. Debar, and M. Ducassé, “M2D2: A formal data model for IDS alert correlation,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2516, pp. 115–137, 2002.
  15. P. A. Porras, M. W. Fong, and A. Valdes, “A missionimpact-based approach to INFOSEC alarm correlation,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2516, pp. 95–114, 2002.
  16. E. Chakir, M. Moughit, and Y. I. Khamlichi, “Building an Efficient Alert Management Model for Intrusion Detection Systems,” vol. 3, no. 1, pp. 18–24, 2018.
  17. G. Lyon, “Nmap: the network mapper,”, 2018. .
  18. B. Kolosnjaji, A. Zarras, G. Webster, and C. Eckert, “Deep learning for classification of malware system call sequences,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9992 LNAI, pp. 137–149, 2016.
  19. I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical machine learning tools and techniques. 2005.
  20. X. Wu et al., “Top 10 algorithms in data mining,” Knowl. Inf. Syst., vol. 14, no. 1, pp. 1–37, 2008.
  21. T. Fawcett, “An introduction to ROC analysis,” Irbm, vol. 35, no. 6, pp. 299–309, 2005.

Article full text

Download PDF