Title: Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains
Authors: Nunes, Rafael Oleques
Santos, Joaquim
Spritzer, André
Balreira, Dennis G.
Freitas, Carla M. Dal Sasso
Olival, Fernanda
Cameron, Helena Freire
Vieira, Renata
Editors: Paes, Aline
Verri, Filipe A. N.
Keywords: Humanidades Digitais
Processamento de Língua Natural
Named Entity Recognition
Variantes do Português
Large Language Models
Issue Date: 2025
Publisher: Springer, Cham
Citation: Nunes, Rafael Oleques; Santos, Joaquim; Spritzer, Andre; Balreira, Dennis G.; Freitas, Carla M. Dal Sasso; Olival, Fernanda; Cameron, Helena Freire; Vieira, Renata (2025). «Assessing European and Brazilian Portuguese LLMs for NER in Specialised Domains». In: Paes, A., Verri, F.A.N. (eds) Intelligent Systems. BRACIS 2024. Lecture Notes in Computer Science, vol 15412.. s.l., Springer, Cham, 2025, pp 215–230. ISBN: 978-3-031-79029-4.
Abstract: This paper discusses the impact of Portuguese variants in Large Language Models for the task of named entity recognition (NER) in specialised domains. The tests were made on a Brazilian Portuguese le gal and a European Portuguese historical corpora. The models taken into account are BERTimbau (PT-BR), Albertina (PT-PT and PT-BR), and XML-R (multilingual). The impact was more evident in the Portuguese historical corpus, which resulted in higher F1 measures compared to previous works that did not consider the same language variant. Ad ditionally, the study underscores the impact of model architecture on performance, highlighting the critical role of both linguistic alignment and model size in enhancing NER in specialised domains.
ISBN: 978-3-031-79029-4
Type: bookPart
