Part-of-Speech Tagging for Grammar Checking of Punjabi

May 2009. Volume 4 Issue 1
| May 2009 home | PDF version | SWF version |

Title
Part-of-Speech Tagging for Grammar Checking of Punjabi

Authors
Mandeep Singh Gill & Gurpreet Singh Lehal
Punjabi University, Patiala, India
Shiv Sharma Joshi
Punjabi University, Patiala, India

Bio-Data
Mandeep Singh Gill received his master’s in software engineering from Thapar University, Patiala, India in 2003. His research interests include natural language processing and software testing. He is currently working on developing software for grammar checking of Punjabi, for his Ph.D.
Gurpreet Singh Lehal is currently head of Computer Science Department, Punjabi University Patiala and Director of Advanced Centre for Technical Development of Punjabi Language, Literature and Culture. Dr. Lehal has published more than 30 papers in national/international journals and proceedings of leading conferences. His main areas of interest are optical character recognition and natural language processing.

Shiv Sharma Joshi received his master’s and Ph.D. in Linguistics from University of London, London. Dr. Joshi, a renowned linguistic, is member of various professional bodies, and has authored three books, four dictionaries, and approximately one hundred research papers. His specializations include Punjabi phonology and instrumental (acoustic) phonetics. Lexicography, computational linguistics, and teaching of Punjabi as a foreign language form his current research interests.


 

Abstract

Part-of-speech (POS) tagging is one of the major activities performed in a typical natural language processing application. This paper explores part-of-speech tagging for the Punjabi language, a member of the Modern Indo-Aryan family of languages. A tagset for use in grammar checking and other similar applications is proposed. This fine-grained tagset is based entirely on the grammatical categories involved in various types of concord in typical Punjabi sentences. The morpho-syntactic features taken in this tagset are largely based on the inflectional morphology of Punjabi words. The motivation behind devising this tagset, with focus on agreement features of these languages, is that there is no tagset available for Punjabi or other Indian languages. The tagsets for other languages do not cover all the grammatical features, which are required for agreement checking in Punjabi texts. A rule-based tagger derived from this tagset is also described. This will be the first published POS tagger for Punjabi. The tagset described in this paper is recommended for grammar checking and other similar applications for the languages sharing grammatical features with Punjabi, more specifically the languages of the Modern Indo-Aryan family.

Key Words: morphology, part-of-speech tagging, tagset, grammar checking, computational linguistics, Punjabi

Tags: , , , , ,

Category: 2009