Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/2565
|
Title: | Analysing part-of-speech for Portuguese text classification |
Authors: | Gonçalves, Teresa Quaresma, Paulo |
Keywords: | machine learning Text classification |
Issue Date: | 2006 |
Publisher: | Springer-Verlag |
Abstract: | This paper proposes and evaluates the use of linguistic in- formation in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Sup- port Vector Machines (SVM), which are known to produce good results on text classification tasks.
Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de So Paulo) and juridical documents from the Portuguese Attorney General’s Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong re- duction of the number of features needed in the text classification. |
URI: | http://hdl.handle.net/10174/2565 |
Type: | article |
Appears in Collections: | INF - Artigos em Livros de Actas/Proceedings
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|