Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/4418
|
Title: | Named Entity Recognition using Machine Learning techniques |
Authors: | Miranda, Nuno Raminhos, Ricardo Seabra, Pedro Sequeira, João Gonçalves, Teresa Quaresma, Paulo |
Keywords: | named entities recognition |
Issue Date: | Oct-2011 |
Publisher: | EPIA |
Citation: | N. Miranda, R. Raminhos, P. Seabra, J. Sequeira, T. Golcalves, and P. Quaresma. Named entity recognition using machine learning techniques. In EPIA-11, 15th Portuguese Conference on Artificial Intelligence, Lisbon, PT, pages 818-83, October 2011. |
Abstract: | Knowledge extraction through keywords and relation creation between contents with common keywords is an important asset in
any content management system. Nevertheless, it is impossible to perform manually this kind of information extraction due to the growing amount of textual content of varying quality made available by multiple
creators and distributors of information.
This paper presents and evaluates a prototype developed for the recognition of named entities using orthographic and morphologic word attributes as input and Support Vector Machines as the machine learning technique for identifying those entities in new documents.
Since documents are written in the Portuguese language and there was no
part-of-speech tagger freely available, a model for this language was also
developed using SVMTool, a simple and effective generator of sequential
taggers based on Support Vector Machines. This implied adapting the
Bosque 8.0 corpus by adding a POS tag to every word, since originally
several words were joined into one token with a unique tag and others
were split giving rise to more than one tag. |
URI: | http://hdl.handle.net/10174/4418 |
ISBN: | 978-989-95618-4-7 |
Type: | article |
Appears in Collections: | INF - Artigos em Livros de Actas/Proceedings
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|