Repositório Digital de Publicações Científicas: Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents

Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/2556

Title:	Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents
Authors:	Gonçalves, Teresa Quaresma, Paulo
Keywords:	machine learning named entity recognition
Issue Date:	2010
Publisher:	Springer-Verlag
Abstract:	Information extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classifica- tion using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using se- mantic information from the output of a natural language parser. This information, legal concepts and named entities, may be used to popu- late a simple ontology, allowing the enrichment of documents and the creation of high-level legal information retrieval systems. The proposed methodology was applied to a corpus of legal documents - from the EUR-Lex site – and it was evaluated. The obtained results were quite good and indicate this may be a promising approach to the legal information extraction problem.
URI:	http://hdl.handle.net/10174/2556
ISBN:	978-3-642-12836-3
Type:	article
Appears in Collections:	INF - Artigos em Livros de Actas/Proceedings

Files in This Item:

File	Description	Size	Format
tcg10b-usingLing.pdf	Artigo	254.22 kB	Adobe PDF	View/Open