MODELLING THE RELATIONSHIP BETWEEN SATURATED OXYGEN AND DISTOMS‘ ABUNDANCE USING WEIGTHED PATTERN TREES WITH ALGEBRAIC OPERATORS
Machine learning has been used in many disciplines to reveal important patterns in data. One of the research disciplines that benefits from using these methods is eco-informatics. This branch of applied computer science to solve environmental problems uses computer algorithms to discover the impact of the environmental stress factors on the organisms’ abundance. Decision tree type of machine learning methods are particularly interesting for the computer scientists as well as ecologists, because they provide very easy interpretable structure without any practical knowledge in mathematics or the inner working of the algorithm. These methods do not rely only on classical sets, but many of them are using fuzzy set theory to overcome some problems like overfitting, robustness to data change and improved prediction accuracy. In this direction, this paper aims to discover the influence of one particular environmental stress factor (Saturated Oxygen) on real measured data containing information about the diatoms’ abundance in Lake Prespa, Macedonia, using weighted pattern tree (WPT) algorithm. WPT is a decision tree method variant that combines fuzzy set theory concepts, like similarity metrics, fuzzy membership functions and aggregation operators, to achieve better prediction accuracy, improve interpretability and increase the resistance to overfitting compared to the classical decision trees. In this study, we use Algebraic operators for aggregation. One WPT model is presented in this paper to relate the saturated oxygen parameter with the diatoms’ abundance and reveal which diatoms can be used to indicate certain water quality class (WQC). The obtained results are verified with the existing knowledge found in literature.