Modelling the relationship between saturated oxygen and distoms‘ abundance using weigthed pattern trees with algebraic operators
- 1 Faculty of computer science and engineering, Ss. Cyril and Methodius University in Skopje, Skopje, R. Macedonia
Abstract
Machine learning has been used in many disciplines to reveal important patterns in data. One of the research disciplines that benefits from using these methods is eco-informatics. This branch of applied computer science to solve environmental problems uses computer algorithms to discover the impact of the environmental stress factors on the organisms’ abundance. Decision tree type of machine learning methods are particularly interesting for the computer scientists as well as ecologists, because they provide very easy interpretable structure without any practical knowledge in mathematics or the inner working of the algorithm. These methods do not rely only on classical sets, but many of them are using fuzzy set theory to overcome some problems like overfitting, robustness to data change and improved prediction accuracy. In this direction, this paper aims to discover the influence of one particular environmental stress factor (Saturated Oxygen) on real measured data containing information about the diatoms’ abundance in Lake Prespa, Macedonia, using weighted pattern tree (WPT) algorithm. WPT is a decision tree method variant that combines fuzzy set theory concepts, like similarity metrics, fuzzy membership functions and aggregation operators, to achieve better prediction accuracy, improve interpretability and increase the resistance to overfitting compared to the classical decision trees. In this study, we use Algebraic operators for aggregation. One WPT model is presented in this paper to relate the saturated oxygen parameter with the diatoms’ abundance and reveal which diatoms can be used to indicate certain water quality class (WQC). The obtained results are verified with the existing knowledge found in literature.
Keywords
References
- J.R. Quinlan, “Induction of decision trees”, Mach. Learn., vol. 1, pp. 81–106, 1986.
- “C4.5: Programs for Machine Learning”, San Francisco, CA: Morgan Kaufmann, 1993.
- L. Breiman, J. Friedman, R. Olshen, C. Stone, “Classification and Regression Trees”, Belmont, Wadsworth, 1984.
- H. Van Dam, A. Martens, J. Sinkeldam, “A coded checklist and ecological indicator values of freshwater diatoms from the Netherlands”, Netherlands Journal of Aquatic Ecology, vol. 28, no. 1, pp. 117–133, 1994.
- Y. Yuan, M.J. Shaw, “Induction of fuzzy decision trees”, Fuzzy Sets and Systems, vol. 69, no. 2, pp. 125–139, 1995.
- Y.-l. Chen, T. Wang, B.-s. Wang, Z.-j. Li, “A survey of fuzzy decision tree classifier”, Fuzzy Information and Engineering, vol. 1, no. 2, pp. 149–159, 2009.
- Z.H. Huang, T. D. Gedeon, “Pattern trees”, in: Proc. of IEEE International Conference on Fuzzy Systems, pp. 1784–1791, 2006.
- Z. Huang, M. Nikravesh, B. Azvine, T.D. Gedeon, “Weighted pattern trees: a case study with customer satisfaction dataset”, International Fuzzy Systems Association World Congress 2007, pp. 395–406, Springer, Berlin, Heidelberg, 2007.
- M. Zeinalkhani, M. Eftekhari, “Fuzzy partitioning of continuous attributes through discretization methods to construct fuzzy decision tree classifiers”, Information Sciences, vol. 278, pp. 715–735, 2014.
- A. Naumoski, G. Mirceva, K. Mitreski. “Experimental Evaluation of Different Membership Functions on Weighted Pattern Trees for Diatom Modelling”, 14th International Conference on Natural Computation; Fuzzy Systems and Knowledge Discovery, IEEE, 2018.
- “TRABOREMA Project” WP3, EC FP6-INCO project no. INCO-CT-2004-509177, 2005–2007.
- A. Van Der Werff, H. Huls, “Diatomeanflora van Nederland”, Abcoude - De Hoef, 1957, 1974.
- K. Krammer, H. Lange-Bertalot, ”Die Ssswasserflora von Mitteleuropa 2: Bacillariophyceae. 1 Teil”, pp. 876, Stuttgart: Gustav Fischer-Verlag, 1986.