Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10637/13662
Registro completo de metadatos
Campo DC Valor Lengua/Idioma
dc.creatorPérez Díez, Irene-
dc.creatorPérez Moraga, Raúl-
dc.creatorLópez Cerdán, Adolfo-
dc.creatorSalinas Serrano, José María-
dc.creatorIglesia Vayá, María de la-
dc.date2021-
dc.date.accessioned2022-04-14T04:00:29Z-
dc.date.available2022-04-14T04:00:29Z-
dc.date.issued2021-03-29-
dc.identifier.citationPérez-Díez, I., Pérez-Moraga, R., López-Cerdán, A., Salinas-Serrano, J.M. & Vayá, M.I. (2021). De-identifying Spanish medical texts - named entity recognition applied to radiology reports. Journal of Biomedical Semantics, vol. 12, art. 6 (29 mar.). DOI: https://doi.org/10.1186/s13326-021-00236-2-
dc.identifier.issn2041-1480 (Electrónico)-
dc.identifier.urihttp://hdl.handle.net/10637/13662-
dc.descriptionEste artículo se encuentra disponible en la siguiente URL: https://jbiomedsem.biomedcentral.com/track/pdf/10.1186/s13326-021-00236-2.pdf-
dc.description.abstractBackground: Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. Results: We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. Conclusions: The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.-
dc.formatapplication/pdf-
dc.language.isoen-
dc.language.isoes-
dc.publisherBioMed Central-
dc.relationEste artículo de investigación describe el trabajo realizado en el contexto del proyecto DeepHealth, “Deep-Learning and HPC to Boost Biomedical Applications for Health” que ha recibido financiación del programa de investigación e innovación Horizonte 2020 de la Unión Europea bajo convenio de subvención n. 825111.-
dc.relation.ispartofJournal of Biomedical Semantics, vol. 12-
dc.rightshttp://creativecommons.org/licenses/by/4.0/deed.es-
dc.subjectProceso de lenguaje natural.-
dc.subjectNatural lenguage processing.-
dc.subjectRadiología.-
dc.subjectData protection.-
dc.subjectDiagnóstico radiológico.-
dc.subjectDiagnosis, Radioscopic.-
dc.subjectProtección de datos personales.-
dc.subjectRadiology.-
dc.titleDe-identifying Spanish medical texts-named entity recognition applied to radiology reports-
dc.typeArtículo-
dc.identifier.doihttps://doi.org/10.1186/s13326-021-00236-2-
dc.local.notesUCH. ESI International Chair@CEU-UCH-
dc.local.notesProducción Científica UCH 2021-
dc.local.notesUCH. Departamento de Matemáticas, Física y Ciencias Tecnológicas-
Aparece en las colecciones: Dpto. Matemáticas, Física y Ciencias Tecnológicas




Los ítems de DSpace están protegidos por copyright, con todos los derechos reservados, a menos que se indique lo contrario.