Please use this identifier to cite or link to this item:

Title: De-identification of Clinical Notes Using Contextualized Language Models and a Token Classifier
Authors: Santos, Joaquim
Santos, Henrique
Tabalipa, Fabio
Vieira, Renata
Keywords: Electronic health records
Named entity recognition
Issue Date: Nov-2021
Publisher: Springer
Citation: Santos J., dos Santos H.D.P., Tabalipa F., Vieira R. (2021) De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier. In: Britto A., Valdivia Delgado K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science, vol 13074. Springer, Cham.
Abstract: The de-identification of clinical notes is crucial for the reuse of electronic clinical data and is a common Named Entity Recognition (NER) task. Neural language models provide a great improvement in Natural Language Processing (NLP) tasks, such as NER, when they are integrated with neural network methods. This paper evaluates the use of current state-of-the-art deep learning methods (Bi-LSTM-CRF) in the task of identifying patient names in clinical notes, for de-identification purposes. We used two corpora and three language models to evaluate which combination delivers the best performance. In our experiments, the specific corpus for the de-identification of clinical notes and a contextualized embedding with word embeddings achieved the best result: an F-measure of 0.94.
Type: article
Appears in Collections:CIDEHUS - Artigos em Livros de Actas/Proceedings

Files in This Item:

File Description SizeFormat
BRACIS___Anony.pdf216.19 kBAdobe PDFView/OpenRestrict Access. You can Request a copy!
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois