Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/17099

Title: An Approach to the POS Tagging Problem Using Genetic Algorithms
Authors: Silva, Ana Paula
Silva, Arlindo
Pimenta Rodrigues, Irene
Editors: Madani, Kurosh
Correia, Dourado Antonio
Rosa, Agostinho
Filipe, Joaquim
Keywords: Part-of-speech Tagging Disambiguation
Evolutionary Algorithms
Natural Language Processing
Issue Date: 2015
Publisher: Springer International Publishing
Citation: Ana Paula Silva , Arlindo Silva, Irene Rodrigues. An Approach to the POS Tagging Problem Using Genetic Algorithms. Chapter Computational Intelligence Volume 577 of the series Studies in Computational Intelligence pp 3-17. Springer, 2015
Abstract: The automatic part-of-speech tagging is the process of automatically assigning to the words of a text a part-of-speech (POS) tag. The words of a language are grouped into grammatical categories that represent the function that they might have in a sentence. These grammatical classes (or categories) are usually called part-of-speech. However, in most languages, there are a large number of words that can be used in different ways, thus having more than one possible part-of-speech. To choose the right tag for a particular word, a POS tagger must consider the surrounding words’ part-of-speeches. The neighboring words could also have more than one possible way to be tagged. This means that, in order to solve the problem, we need a method to disambiguate a word’s possible tags set. In this work, we modeled the part-of-speech tagging problem as a combinatorial optimization problem, which we solve using a genetic algorithm. The search for the best combinatorial solution is guided by a set of disambiguation rules that we first discovered using a classification algorithm, that also includes a genetic algorithm. Using rules to disambiguate the tagging, we were able to generalize the context information present on the training tables adopted by approaches based on probabilistic data. We were also able to incorporate other type of information that helps to identify a word’s grammatical class. The results obtained on two different corpora are amongst the best ones published.
URI: http://dx.doi.org/10.1007/978-3-319-11271-8_1
http://hdl.handle.net/10174/17099
Type: bookPart
Appears in Collections:INF - Publicações - Capítulos de Livros

Files in This Item:

File Description SizeFormat
t.txt1.46 kBTextView/OpenRestrict Access. You can Request a copy!
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois