Views : 278       Downloads : 216 Download PDF




A Fine-Grained Tagset for Bengali Language

Corresponding Author : Md. Abdullah Al Mumin (mumin-cse@sust.edu)

Authors : Arun Krishna Paul (aruncse2007007@gmail.com)

Keywords : Bengali Tagset, Inflectional Language, Fine-Grained Tagset, Coarse-Grained Tagset

Abstract :

The lexical tags, called Tagsets, play a significant role providing the large amount of information about a word and its neighbors, telling us something about how the words are pronounced, being used in stemming for information retrieval[1]. So, a standard tagset is necessary for working with a language in any computational linguistic field. Two major kinds of tagsets for a language are fine-grained tagset, which uses a large number of tags, and coarsegrained tagset, which uses a small number of tags. The goal of this paper is to propose a finegrained tagset, containing a total of 1070 tags, for tagging Bengali texts. Being a completely inflectional language, Bengali requires more tags for tagging a text than English or some other non-inflectional languages. A good example of this is the proper noun ‘’ (masculine gender)[TABLE 1], which takes 39 forms in Bengali. 

Published on January 28th, 2014 in Volume 21, Issue 1, Applied Sciences and Technology