Ana Iglesias, Elena Castro, Rebeca Pérez, Leonardo Castaño and Paloma Martínez (Universidad Carlos III de Madrid)
Ana Mª Iglesias Maqueda <aiglesiainf.uc3m.es>
The MOSTAS system (MOrpho-Semantic Tagger, Anonymizer and SpellChecker for biomedical texts) preprocesses Clinical Reports in order to facilitate information retrieval tasks (clinical concepts, abbreviations, entities, etc.). MOSTAS system annotates clinical reports with morpho-semantic information, applies abbreviation and acronyms conversions and detects biomedical concepts using specialized biomedical resources (databases, thesaurus, a multilingual terminology server, etc.). Moreover, MOSTAS is able to anonymize and correct the clinical reports.
MOSTAS preprocesses semi-structured information from clinical reports, tagging the clinical texts with specialized information, eliminating sensible information from patients and detecting if there are spellchecker errors.
The system is implemented in Java, suitable for being installed in Linux and Windows platforms.
Currently, STILUS is used as morpho-semantic tagger. Biomedical resources are needed also, as the SNOMED thesaurus, the list of abbreviations and acronyms provided by the Spanish Ministry of Health and Consume among others. Moreover, Java is necessary for its compilation and execution.
The system is divided mainly in five different modules. Three of them deal with the pre-processing phase of the clinical reports: the Morpho-semantic Analyzer, the Acronym/Abbreviation Finder and the Biomedical Concept Finder. The other two modules deal with a post-processing of the text: a domain-specific Spell-checker and Anonymizing module were sensible information of the patients is eliminated from the clinical texts.
MOSTAS is a complete system that permits to tag clinical texts, anonymize them and detect and correct spellchecker errors. This system works with semi-structured texts in Spanish. Nowadays, most of the research in this area is done in English, so MOSTAS is the first complete system in Spanish. Moreover, MOSTAS implements a biomedical resource similar to English METAMAP for UML but for the Spanish SNOMED thesaurus.
MOSTAS has been developed during the ISSE project and the result of different PHD works in LABDA have been taken into account, as the SPINDEL system.
Ana Iglesias, Elena Castro, Rebeca Pérez, Leonardo Castaño, Paloma Martínez, José Manuel Gómez Pérez, Sandra Kohler y Ricardo Melero. MOSTAS: Un Etiquetador Morfo-Semántico, Anonimizador y Corrector de Historiales Clínicos. XXIV edición del Congreso Anual de la Sociedad Española para el Procesamiento del Lenguaje Natural 2008 (SEPLN´ 08). Vol. 41, Pp. 299-300.