SOCIETY

CROWDSOURCING LANGUAGE RESOURCES FOR SPEECH RECOGNITION

  • 1 Technical University of Košice, Košice, Slovakia

Abstract

An important part of any speech recognition system is a language model. Creation of a language model requires proper processing of large quantities of textual data. Part-of-speech tags, named entities or semantic roles in the text help with precise statistical language modeling. The natural language processing methods are usually trained on annotated text corpora. Annotation of text corpora or dictionaries is a difficult process that requires a lot of human work involved. Crowdsourcing is a specific sourcing model in which individuals or organizations use contributions of Internet users to create a specific knowledge base.

Keywords

Article full text

Download PDF