|
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10174/37887
|
Title: | Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains |
Authors: | Nunes, Rafael Oleques Santos, Joaquim Spritzer, André Balreira, Dennis G. Freitas, Carla M. Dal Sasso Olival, Fernanda Cameron, Helena Freire Vieira, Renata |
Editors: | Paes, Aline Verri, Filipe A. N. |
Keywords: | Humanidades Digitais Processamento de Língua Natural Named Entity Recognition Variantes do Português Large Language Models |
Issue Date: | 2025 |
Publisher: | Springer, Cham |
Citation: | Nunes, Rafael Oleques; Santos, Joaquim; Spritzer, Andre; Balreira, Dennis G.; Freitas, Carla M. Dal Sasso; Olival, Fernanda; Cameron, Helena Freire; Vieira, Renata (2025). «Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains». In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science, vol 15412.. s.l., Springer, Cham, 2025, pp 215–230. ISBN: 978-3-031-79029-4. https://doi.org/10.1007/978-3-031-79029-4_15 |
Abstract: | This paper discusses the impact of Portuguese variants in
Large Language Models for the task of named entity recognition (NER)
in specialised domains. The tests were made on a Brazilian Portuguese le
gal and a European Portuguese historical corpora. The models taken into
account are BERTimbau (PT-BR), Albertina (PT-PT and PT-BR), and
XML-R (multilingual). The impact was more evident in the Portuguese
historical corpus, which resulted in higher F1 measures compared to
previous works that did not consider the same language variant. Ad
ditionally, the study underscores the impact of model architecture on
performance, highlighting the critical role of both linguistic alignment
and model size in enhancing NER in specialised domains. |
URI: | http://hdl.handle.net/10174/37887 |
ISBN: | 978-3-031-79029-4 |
Type: | bookPart |
Appears in Collections: | CIDEHUS - Publicações - Capítulos de Livros
|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
|