Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/4418

Title: Named Entity Recognition using Machine Learning techniques
Authors: Miranda, Nuno
Raminhos, Ricardo
Seabra, Pedro
Sequeira, João
Gonçalves, Teresa
Quaresma, Paulo
Keywords: named entities recognition
Issue Date: Oct-2011
Publisher: EPIA
Citation: N. Miranda, R. Raminhos, P. Seabra, J. Sequeira, T. Golcalves, and P. Quaresma. Named entity recognition using machine learning techniques. In EPIA-11, 15th Portuguese Conference on Artificial Intelligence, Lisbon, PT, pages 818-83, October 2011.
Abstract: Knowledge extraction through keywords and relation creation between contents with common keywords is an important asset in any content management system. Nevertheless, it is impossible to perform manually this kind of information extraction due to the growing amount of textual content of varying quality made available by multiple creators and distributors of information. This paper presents and evaluates a prototype developed for the recognition of named entities using orthographic and morphologic word attributes as input and Support Vector Machines as the machine learning technique for identifying those entities in new documents. Since documents are written in the Portuguese language and there was no part-of-speech tagger freely available, a model for this language was also developed using SVMTool, a simple and effective generator of sequential taggers based on Support Vector Machines. This implied adapting the Bosque 8.0 corpus by adding a POS tag to every word, since originally several words were joined into one token with a unique tag and others were split giving rise to more than one tag.
URI: http://hdl.handle.net/10174/4418
ISBN: 978-989-95618-4-7
Type: article
Appears in Collections:INF - Artigos em Livros de Actas/Proceedings

Files in This Item:

File Description SizeFormat
epia2011b.pdf257.38 kBAdobe PDFView/OpenRestrict Access. You can Request a copy!
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois