Repositório Digital de Publicações Científicas: Analysing part-of-speech for Portuguese text classification

Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/2565

Title:	Analysing part-of-speech for Portuguese text classification
Authors:	Gonçalves, Teresa Quaresma, Paulo
Keywords:	machine learning Text classification
Issue Date:	2006
Publisher:	Springer-Verlag
Abstract:	This paper proposes and evaluates the use of linguistic in- formation in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Sup- port Vector Machines (SVM), which are known to produce good results on text classification tasks. Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de So Paulo) and juridical documents from the Portuguese Attorney General’s Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong re- duction of the number of features needed in the text classification.
URI:	http://hdl.handle.net/10174/2565
Type:	article
Appears in Collections:	INF - Artigos em Livros de Actas/Proceedings

Files in This Item:

File	Description	Size	Format
tcg06-analysingPOS.pdf	Artigo	136.18 kB	Adobe PDF	View/Open