On the Possibility of Interpretable Rules Generation for the Classification of Malware Samples

  • 1 Institute of Informatics, Slovak Academy of Sciences, Bratislava, Slovakia


Nowadays, sophisticated models and approaches are used in the field of Malware classification or detection. Modern trends propose the use of black-box kind of models like Deep learning or Neural networks, thus, often, the results are not human-interpretable. In this paper we focus on the well-known EMBER dataset with the focus on interpretable models like Decision trees and Decision tables. We were able to generate interpretable classification trees, which can serve in conjunction with the concept-learning or as a support for ontology creation.



  1. Íncer Romeo, Í., Theodorides, M., Afroz, S., & Wagner, D. (2018, March). Adversarially robust malware detection using monotonic classification. In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics (pp. 54- 63).
  2. Kilgallon, S., De La Rosa, L., & Cavazos, J. (2017, September). Improving the effectiveness and efficiency of dynamic malware analysis with machine learning. In 2017 Resilience Week (RWS) (pp. 30-36). IEEE.
  3. Svec, P., Balogh, S., & Homola, M. (2021). Experimental Evaluation of Description Logic Concept Learning Algorithms for Static Malware Detection. In ICISSP (pp. 792-799).
  4. Anderson, H. S., & Roth, P. (2018). Ember: an open dataset for training static pe malware machine learning models. arXiv preprint arXiv:1804.04637.
  5. Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
  6. Weka 3., accessed 2022/10/29.
  7. Cochran, W. G. (1952). The χ2 test of goodness of fit. The Annals of mathematical statistics, 315-345.
  8. Hall, M. A. (1998). Correlation-based feature subset selection for machine learning. Thesis submitted in partial fulfillment of the requirements of the degree of Doctor of Philosophy at the University of Waikato.

Article full text

Download PDF