Repositório Digital de Publicações Científicas: Is linguistic information relevant for the classification of legal texts?

Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/2561

Title:	Is linguistic information relevant for the classification of legal texts?
Authors:	Gonçalves, Teresa Quaresma, Paulo
Keywords:	Text classification
Issue Date:	2005
Publisher:	ACM
Abstract:	Text classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts. Support Vector Machines (SVM), a machine learning al- gorithm, has shown to be a good classifier for text bases [Joachims, 2002]. In this paper, SVMs are applied to the classification of European Portuguese legal texts – the Por- tuguese Attorney General’s Office Decisions – and the rele- vance of linguistic information in this domain, namely lem- matisation and part-of-speech tags, is evaluated. The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm.
URI:	http://hdl.handle.net/10174/2561
ISBN:	ISBN 1-59593-081-7
Type:	article
Appears in Collections:	INF - Artigos em Livros de Actas/Proceedings

Files in This Item:

File	Description	Size	Format
tcg05b-linguistic.pdf	Artigo	195.51 kB	Adobe PDF	View/Open