• TECHNOLOGICAL BASIS OF “INDUSTRY 4.0”

    ALB-Stanza: A Stanza-based parser for the Albanian Language

    Industry 4.0, Vol. 9 (2024), Issue 6, pg(s) 203-206

    Fundamental tasks in Natural Language Processing include Part-of-Speech tagging, lemmatization, and dependency parsing, which provide crucial linguistic information essential for a wide range of NLP applications. POS tagging assigns each word in a sentence its grammatical category. Lemmatization identifies the dictionary form of each word, considering its contextual usage within the sentence. Dependency parsing, on the other hand, determines the structural relationships between words, producing dependency trees that capture the grammatical organization of sentences. In this paper, we introduce ALB-Stanza, a neural pipeline parser designed for sentence segmentation, tokenization, POS tagging, morphological feature annotation, lemmatization, and dependency parsing specifically for the Albanian language. To train the ALB-Stanza parser model, we used our own corpora annotated according to the Universal Dependencies schema and the Stanza neural pipeline. The model was evaluated on unseen data, demonstrating its effectiveness in accurately predicting POS and morphological feature tags, lemmas, and dependency relations for text in Albanian language.