De-identifying Spanish medical texts-named entity recognition applied to radiology reports

Pérez Díez, Irene; Pérez Moraga, Raúl; López Cerdán, Adolfo; Salinas Serrano, José María; Iglesia Vayá, María de la

Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10637/13662

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.creator	Pérez Díez, Irene	-
dc.creator	Pérez Moraga, Raúl	-
dc.creator	López Cerdán, Adolfo	-
dc.creator	Salinas Serrano, José María	-
dc.creator	Iglesia Vayá, María de la	-
dc.date	2021	-
dc.date.accessioned	2022-04-14T04:00:29Z	-
dc.date.available	2022-04-14T04:00:29Z	-
dc.date.issued	2021-03-29	-
dc.identifier.citation	Pérez-Díez, I., Pérez-Moraga, R., López-Cerdán, A., Salinas-Serrano, J.M. & Vayá, M.I. (2021). De-identifying Spanish medical texts - named entity recognition applied to radiology reports. Journal of Biomedical Semantics, vol. 12, art. 6 (29 mar.). DOI: https://doi.org/10.1186/s13326-021-00236-2	-
dc.identifier.issn	2041-1480 (Electrónico)	-
dc.identifier.uri	http://hdl.handle.net/10637/13662	-
dc.description	Este artículo se encuentra disponible en la siguiente URL: https://jbiomedsem.biomedcentral.com/track/pdf/10.1186/s13326-021-00236-2.pdf	-
dc.description.abstract	Background: Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. Results: We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. Conclusions: The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.	-
dc.format	application/pdf	-
dc.language.iso	en	-
dc.language.iso	es	-
dc.publisher	BioMed Central	-
dc.relation	Este artículo de investigación describe el trabajo realizado en el contexto del proyecto DeepHealth, “Deep-Learning and HPC to Boost Biomedical Applications for Health” que ha recibido financiación del programa de investigación e innovación Horizonte 2020 de la Unión Europea bajo convenio de subvención n. 825111.	-
dc.relation.ispartof	Journal of Biomedical Semantics, vol. 12	-
dc.rights	http://creativecommons.org/licenses/by/4.0/deed.es	-
dc.subject	Proceso de lenguaje natural.	-
dc.subject	Natural lenguage processing.	-
dc.subject	Radiología.	-
dc.subject	Data protection.	-
dc.subject	Diagnóstico radiológico.	-
dc.subject	Diagnosis, Radioscopic.	-
dc.subject	Protección de datos personales.	-
dc.subject	Radiology.	-
dc.title	De-identifying Spanish medical texts-named entity recognition applied to radiology reports	-
dc.type	Artículo	-
dc.identifier.doi	https://doi.org/10.1186/s13326-021-00236-2	-
dc.local.notes	UCH. ESI International Chair@CEU-UCH	-
dc.local.notes	Producción Científica UCH 2021	-
dc.local.notes	UCH. Departamento de Matemáticas, Física y Ciencias Tecnológicas	-
Aparece en las colecciones:	Dpto. Matemáticas, Física y Ciencias Tecnológicas

Mostrar el registro sencillo del ítem