Classification of protein structures by using fuzzy KNN classifier and protein voxel-based descriptor

  • 1 Faculty of computer science and engineering, Ss. Cyril and Methodius University in Skopje, Skopje, R. Macedonia

Abstract

Protein classification is among the main themes in bioinformatics, for the reason that it helps understand the protein molecules. By classifying the protein structures, the evolutionary relations between them can be discovered. The knowledge for protein structures and the functions that they might have could be used to regulate the processes in organisms, which is made by developing medications for different diseases. In the literature, plethora of methods for protein classification are offered, including manual, automatic or semiautomatic methods. The manual methods are considered as precise, but their main problem is that they are time consuming, hence by using them a large number of protein structures stay uncategorized. Therefore, the researchers intensively work on developing methods that would afford classification of protein structures in automatic way with acceptable precision. In this paper, we propose an approach for classifying protein structures. Our protein voxel-based descriptor is used to describe the features of protein structures. For classification of unclassified protein structures, we use a k nearest neighbors classifier based on fuzzy logic. For evaluation, we use knowledge for the classification of protein structures in the SCOP database. We provide some results from the evaluation of our approach. The results show that the proposed approach provide accurate classification of protein structures with reasonable speed.

Keywords

References

  1. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The protein data bank,” Nucleic Acids Res., vol. 28, no. 1, pp. 235–242, 2000.
  2. RCSB Protein Data Bank, http://www.rcsb.org, 2018.
  3. A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, “Scop: a structural classification of proteins database for the investigation of sequences and structures,” J. Mol. Biol., vol. 247, no. 4, pp. 536–540, 1995.
  4. C. A. Orengo, A. D. Michie, D. T. Jones, M. B. Swindells, and J. M. Thornton, “CATH – a hierarchic classification of protein domain structures,” Structure, vol. 5, no. 8, pp. 1093–1108, 1997.
  5. S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., vol. 48, no. 3, pp. 443–453, 1970.
  6. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., vol. 215, no. 3, pp. 403–410, 1990.
  7. S. F. Altschul, T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res., vol. 25, no. 17, pp. 3389–3402, 1997.
  8. I. N. Shindyalov and P. E. Bourne, “Protein structure alignment by incremental combinatorial extension (CE) of the optimal path,” Protein Eng., vol. 11, no. 9, pp. 739–747, 1998.
  9. A. R. Ortiz, C. E. Strauss, and O. Olmea, “Mammoth: an automated method for model comparison,” Protein Sci., vol. 11, no. 11, pp. 2606– 2621, 2002.
  10. L. Holm and C. Sander, “Protein structure comparison by alignment of distance matrices,” J. Mol. Biol., vol. 233, no. 1, pp. 123–138, 1993.
  11. S. Cheek, Y. Qi, S. S. Krishna, L. N. Kinch, and N. V. Grishin, “SCOPmap: automated assignment of protein structures to evolutionary superfamilies,” BMC Bioinformatics, vol. 5, pp. 197–221, 2004.
  12. C. H. Tung and J. M. Yang, “fastSCOP: a fast web server for recognizing protein structural domains and SCOP superfamilies,” Nucleic Acids Res., vol. 35, W438–W443, 2007.
  13. K. Marsolo, S. Parthasarathy, and C. Ding, “A Multi-Level Approach to SCOP Fold Recognition,” IEEE Symposium on Bioinformatics and Bioeng., pp. 57–64, 2005.
  14. P. H. Chi, Efficient protein tertiary structure retrievals and classifications using content based comparison algorithms, PhD thesis, University of Missouri-Columbia, 2007.
  15. G. Mirceva, I. Cingovska, Z. Dimov, and D. Davcev, “Efficient approaches for retrieving protein tertiary structures,” IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 9, no. 4, pp. 1166–1179, 2012.
  16. J. M. Keller, M. R. Gray, and J. R. Givens, “A fuzzy k-nearest neighbor algorithm,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 15, no. 4, pp. 580–585, 1985.
  17. D. V. Vranic, 3D Model Retrieval, Ph.D. Thesis, University of Leipzig, 2004.
  18. P. Daras, D. Zarpalas, A. Axenopoulos, D. Tzovaras, and M. G. Strintzis, “Three-Dimensional Shape-Structure Comparison Method for Protein Classification,” IEEE/ACM Trans. Comput. Biol. Bioinform, vol. 3, no. 3, pp. 193–207, 2006.
  19. D. Aha, D. Kibler, “Instance-based learning algorithms,” Machine Learning, vol. 6, no. 1, pp. 37–66, 1991.

Article full text

Download PDF