AnCora-CO-Ca

Autores:	M. Antònia Martí, Mariona Taulé, Marta Recasens, Lluís Màrquez and Manuel Bertran (CLiC-UB)
URL:	http://clic.ub.edu/ancora
Contacto:	Mariona Taulé <mtauleub.edu>

Descripción

AnCora-CO-Ca is a subset of the multilevel annotated corpus AnCora-Ca (for Catalan), consisting of 400,000 words, enriched with coreference information, where all noun phrases (NPs) –pronominal or with a nominal head– pointing to the same entity are linked.

Funcionalidad

AnCora-CO-Es can be a useful resource for training and evaluating coreference resolution systems for Spanish. From a linguistic point of view, the annotated corpus can be used as a workbench to test and validated hypotheses on coreferential expressions for Spanish. This corpus will be used in SemEval 2010 coreference resolution task.

Tecnología

Data stored in XML format

Requisitos técnicos

Módulos

Innovación

At present AnCora-CO-Ca is the largest Catalan corpus annotated with coreference and freely available.

Desarrollo

The development of AnCora-CO-Es has been funded by the following projects: PRAXEM (HUM2006-27378-E) and Lang2World (TIN2006-15265-C06-06) from the Spanish Ministry of Education and Science.

Publicaciones

Recasens, M., M.A.Martí, M. Taulé (2008) First-mention Definites: More than Exceptional Cases, S.Featherson & S. Winkler (eds), Fruits: Process and Product in Empirical Linguistics. Berlin: de Gruyter.
Recasens, M. (2008) Towards Coreference Resolution for Catalan and Spanish. Master Thesis. Universitat de Barcelona.
Recasens, M., M. A. Martí i M. Taulé (2007) ‘Where Anaphora and Coreference Meet. Annotation in the CESS-ECE Corpus’. Recent Advances in Natural language Processing. Borovets, Bulgaria

Red Temática en Tratamiento de la Información Multilingüe y Multimodal (TIMM)

Red Temática en Tratamiento de la Información Multilingüe y Multimodal (TIMM)

Red Temática en Tratamiento de la Información Multilingüe y Multimodal (TIMM)

Índice recursos

Descripción

Funcionalidad

Tecnología

Requisitos técnicos

Módulos

Innovación

Desarrollo

Publicaciones