DSpace Community:http://hdl.handle.net/10174/272024-03-04T10:50:53Z2024-03-04T10:50:53ZDetecting Persuasion Attempts on Social Networks: Unearthing the Potential of Loss Functions and Text Pre-Processing in Imbalanced Data SettingsTeimas, RúbenSaias, Joséhttp://hdl.handle.net/10174/357082023-11-22T11:09:03Z2023-10-28T23:00:00ZTitle: Detecting Persuasion Attempts on Social Networks: Unearthing the Potential of Loss Functions and Text Pre-Processing in Imbalanced Data Settings
Authors: Teimas, Rúben; Saias, José
Abstract: The rise of social networks and the increasing amount of time people spend on them
have created a perfect place for the dissemination of false narratives, propaganda, and manipulated
content. In order to prevent the spread of disinformation, content moderation is needed. However,
manual moderation is unfeasible due to the large amount of daily posts. This paper studies the
impact of using different loss functions on a multi-label classification problem with an imbalanced
dataset, consisting of 20 persuasion techniques and only 950 samples, provided by SemEval’s 2021
Task 6. We used machine learning models, such as Naive Bayes and Decision Trees, and a custom
deep learning architecture, based on DistilBERT and Convolutional Layers. Overall, the machine
learning models achieved far worse results than the deep learning model, using Binary Cross Entropy,
which we considered our baseline deep learning model. To address the class imbalance problem, we
trained our model using different loss functions, such as Focal Loss and Asymmetric Loss. The latter
providing the best results, particularly for the least represented classes.2023-10-28T23:00:00ZApp BL-SLAMJavier, Leonhttp://hdl.handle.net/10174/351962023-05-18T10:00:41Z2022-04-30T23:00:00ZTitle: App BL-SLAM
Authors: Javier, Leon
Abstract: BL-SLAM consist in process 3D data from a velodyne lidar mounted on a vehicle. Using these data is
progressively build a map and estimate the trajectory of the vehicle using simultaneous localization and
mapping (SLAM), having as main processes Odometry and graph-optimization.2022-04-30T23:00:00ZVision DocumentationJavier, LeonPedro, Salgueirohttp://hdl.handle.net/10174/351952023-05-18T09:59:54Z2022-08-31T23:00:00ZTitle: Vision Documentation
Authors: Javier, Leon; Pedro, Salgueiro
Abstract: Documentation on how to the Vision supercomputer, including information on how to submit
and manage jobs in the best possible way.2022-08-31T23:00:00ZAn Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning ProcessingCarnaz, GonçaloAntunes, MárioNogueira, Vitor Beireshttp://hdl.handle.net/10174/346952023-02-24T12:58:23Z2021-06-25T23:00:00ZTitle: An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing
Authors: Carnaz, Gonçalo; Antunes, Mário; Nogueira, Vitor Beires
Abstract: Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.2021-06-25T23:00:00Z