DrugNerAr corpus

Autores:	Isabel Segura-Bedmar, Mario Crespo Paloma Martínez, César de Pablo-Sánchez
URL:	http://labda.inf.uc3m.es/
Contacto:	Isabel Segura-Bedmar <isegurainf.uc3m.es>, Paloma Martínez <pmfinf.uc3m.es>

Descripción

There is no corpus dedicated to the resolution of the anaphoric expressions occurring in drug interaction descriptions in pharmacological documents. A collection of 49 unstructured and plain documents was taken randomly from the field ’interactions’ in the DrugBank database. Documents have on average 40 sentences, 716 words and 331 anaphoric expressions. Documents were downloaded by using an automatic robot developed with the free tool openKapow. Each document was subsequently preprocessed by MMTx and the DrugNer system. The corpus was annotated manually by a linguist with the assistance of a pharmaceutical expert over the output of MMTx and DrugNer.

Funcionalidad

Tecnología

DrugNer is an XML database. An XML database allows data to be stored in XML format. This data can then be queried, exported and serialized into the desired format. An XML database defines a logic model from an XML document and stores and retrieves information according to this model. An XML database does not use SQL like query language. The XML database supports at least one form of querying syntax. Minimally, just about all of them support XPath for performing queries against documents or collections of documents. XPath provides a simple pathing system that allows users to identify nodes that match a particular set of criteria. In addition, it supports XSLT as a method of transforming documents or query-results retrieved from the database. XSLT provides a declarative language written using an XML grammar. The main systematic criteria and methodology for ordering the data are the attributes ID for each element in XML format. Also, every element or attribute can be queried by some queryng language.

Requisitos técnicos

Módulos

Innovación

There is no corpus dedicated to the resolution of the anaphoric expressions occurring in drug interaction descriptions in pharmacological documents. The DrugNerAr corpus is the only annotated resource for drug anaphoric expressions built to date. This corpus is free for academic research and is available in http://labda.inf.uc3m.es/DrugDDI/.

Desarrollo

This corpus was part of the thesis “Application of information extraction techniques to pharmacological domain: extracting drug-drug interactions” Isabel Segura-Bedmar, Advisor: Paloma Martínez. Recently, this thesis has been granted with the Extraordinary PhD award 2011. This work has been partially supported by the Spanish research projects: MA2VICMR consortium (S2009/TIC-1542, www.mavir.net), a network of excellence funded by the Madrid Regional Government and TIN2007-67407-C03-01 (BRAVO: Advanced Multimodal and Multilingual Question Answering).

Publicaciones

Isabel Segura-Bedmar, Mario Crespo, César de Pablo-Sánchez, Paloma Martínez, (2010). Resolving anaphoras for the extraction of drug-drug interactions in pharmacological documents. , April, 2010, BMC BioInformatics, ISSN: 1471-2105, Volumen: 11, Número: (Suppl 2).
Isabel Segura-Bedmar, Mario Crespo, César de Pablo-Sánchez, Paloma Martínez, (2010). Score-based approach for Anaphora Resolution in Drug-Drug Interactions Documents, April, 2010, Natural Language Processing and Information Systems, Springer Berlin / Heidelberg, ISBN: 978-3-642-125, ISSN: 0302-9743, Volumen: 5723/2010, Páginas: 91-102, url.
Sergio Aparicio, Isabel Segura-Bedmar, (2009). Resolución de expresiones anafóricas en textos biomédicos., Colmenarejo, España, February, 2009, III Jornadas PLN-TIMM, Páginas: 47-48.

Red Temática en Tratamiento de la Información Multilingüe y Multimodal (TIMM)

Red Temática en Tratamiento de la Información Multilingüe y Multimodal (TIMM)

Red Temática en Tratamiento de la Información Multilingüe y Multimodal (TIMM)

Índice recursos

Descripción

Funcionalidad

Tecnología

Requisitos técnicos

Módulos

Innovación

Desarrollo

Publicaciones