DSpace Collection:

DSpace Collection: http://hdl.handle.net/10174/38446 2026-06-30T11:23:32Z 2026-06-30T11:23:32Z Performance Evaluation of NLP Models for European Portuguese: Multi-GPU/Multi-node Configurations and Optimization Techniques Santos, Daniel Miquelina, Nuno Schmidt, Daniela Quaresma, Paulo Nogueira, Vítor Beires http://hdl.handle.net/10174/41453 2026-02-25T10:42:34Z 2025-02-17T00:00:00Z

Title: Performance Evaluation of NLP Models for European Portuguese: Multi-GPU/Multi-node Configurations and Optimization Techniques Authors: Santos, Daniel; Miquelina, Nuno; Schmidt, Daniela; Quaresma, Paulo; Nogueira, Vítor Beires Abstract: Natural Language Processing (NLP) research has predominantly focused on the English language, leading to a wealth of resources and advancements tailored to English. However, there is a growing need to extend these capabilities to other languages, such as European Portuguese, to ensure the inclusivity and accessibility of NLP technologies. In this study, we explore the evaluation of NLP models in the European Portuguese language using a multi-GPU/multi-node machine. We utilized various tools such as PyTorch, Accelerate, Transformers, and DeepSpeed with ZeRO Stage 3 to handle the computational demands of large-scale model training. We provide all the key aspects of our methodology to evaluate various models on translated GLUE tasks. Additionally, we introduce AiBERTa, a base model with 110 million parameters, developed and pre-trained on a corpus tailored for European Portuguese. This research highlights the effectiveness of advanced tools and distributed computing in scaling NLP model training, providing a foundation for future enhancements in European Portuguese language processing.

2025-02-17T00:00:00Z A Galician-Portuguese Generative Model Gamallo, Pablo Rodríguez, Pablo Sotelo, Susana Miquelina, Nuno Paniagua, Silvia Schmidt, Daniela de-Dios-Flores, Iria Quaresma, Paulo Bardanca, Daniel Pichel, José Ramom Nogueira, Vítor Barro, Senén http://hdl.handle.net/10174/41452 2026-02-25T10:42:22Z 2024-11-16T00:00:00Z

Title: A Galician-Portuguese Generative Model Authors: Gamallo, Pablo; Rodríguez, Pablo; Sotelo, Susana; Miquelina, Nuno; Paniagua, Silvia; Schmidt, Daniela; de-Dios-Flores, Iria; Quaresma, Paulo; Bardanca, Daniel; Pichel, José Ramom; Nogueira, Vítor; Barro, Senén Abstract: Large language models (LLMs) have revolutionized natural language processing, but their predominant focus on English has resulted in biases and performance differences across various languages. This situation is maintained in generative multilingual models, where English continues to be the predominant language. In these models, the presence of European Portuguese is marginal and that of the Galician variety is almost residual. In this work, we describe an open-source Galician-Portuguese generative model, Carvalho_pt-gl, focused precisely on these two language variants, which are very close lexically and syntactically. The model was trained using a GPT architecture with 1.3 billion parameters on more than 6B words, balanced between the two varieties. The strategy of continual pertaining was used to adapt a pre-existing LLM that was trained on a trilingual dataset with related languages, thereby overcoming the data limitations that would be faced if the training was started from scratch. Evaluation results involving task-based datasets from standardized benchmarks indicate a promising performance. These findings highlight the critical importance of supporting linguistic diversity in generative models.

2024-11-16T00:00:00Z Parameter Efficient Fine-Tunning of LLMs: Application to Machine Translation from English to Portuguese Santos, Daniel Nogueira, Vitor Quaresma, Paulo http://hdl.handle.net/10174/41401 2026-02-23T11:42:01Z 2025-01-01T00:00:00Z

Title: Parameter Efficient Fine-Tunning of LLMs: Application to Machine Translation from English to Portuguese Authors: Santos, Daniel; Nogueira, Vitor; Quaresma, Paulo Abstract: Fine-tuning Large Language Models (LLMs) for specific tasks, such as machine translation, is a computationally expensive process that often requires substantial hardware resources. Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA), offer a resource-efficient alternative by significantly reducing the number of trainable parameters and memory requirements. In this work, we compare the performance and memory efficiency of LoRA and QLoRA on English-Portuguese translation tasks, utilizing two cutting edge LLMs, Meta LLaMA 3.1 8B and Mistral 7B. Our experiments demonstrate that both LoRA and QLoRA achieve substantial memory savings. Moreover, this work underscores the practical advantages of LoRA and QLoRA in resource-constrained environments, providing a foundation for further optimization and experimentation in machine translation using large language models.

2025-01-01T00:00:00Z GuideBP: Guided Backpropagation in Multi-output Neural Networks by Channeling Gradients Through Weaker Logits Ghosh, Swarnendu Mandal, Bodhisatwa Gonçalves, Teresa Quaresma, Paulo Nasipuri, Mita Das, Nibaran http://hdl.handle.net/10174/41303 2026-02-19T11:12:41Z 2024-01-01T00:00:00Z

Title: GuideBP: Guided Backpropagation in Multi-output Neural Networks by Channeling Gradients Through Weaker Logits Authors: Ghosh, Swarnendu; Mandal, Bodhisatwa; Gonçalves, Teresa; Quaresma, Paulo; Nasipuri, Mita; Das, Nibaran Abstract: Convolutional neural networks often generate multiple logits from multiple networks. In most cases, we use simple techniques like addition or column averaging for loss computation. But this allows gradients to be distributed equally among all paths. The proposed approach attempts to guide the gradients of backpropagation along the weakest branches of the neural network. A weakness score is proposed that defines the class-specific performance of individual logits. This is then used to create a new output distribution that would guide gradients along the weakest pathways. The proposed approach has been shown to perform better than traditional column merging techniques and can be used in several application scenarios. Not only can the proposed model be used as an efficient technique for training multiple instances of a model parallelly but also CNNs with multiple output branches have been shown to perform better with the proposed upgrade. Various experiments establish the flexibility of the learning technique which is simple yet effective in various multi-objective scenarios both empirically and statistically.

2024-01-01T00:00:00Z