Autores: | M. Antònia Martí, Mariona Taulé, Marta Recasens, Lluís Màrquez and Manuel Bertran (CLiC-UB) |
URL: | http://clic.ub.edu/ancora |
Contacto: | Mariona Taulé <mtaule |
Descripción
AnCora-CO-Ca is a subset of the multilevel annotated corpus AnCora-Ca (for Catalan), consisting of 400,000 words, enriched with coreference information, where all noun phrases (NPs) –pronominal or with a nominal head– pointing to the same entity are linked.
Funcionalidad
AnCora-CO-Es can be a useful resource for training and evaluating coreference resolution systems for Spanish. From a linguistic point of view, the annotated corpus can be used as a workbench to test and validated hypotheses on coreferential expressions for Spanish. This corpus will be used in SemEval 2010 coreference resolution task.
Tecnología
Data stored in XML format
Requisitos técnicos
Módulos
Innovación
At present AnCora-CO-Ca is the largest Catalan corpus annotated with coreference and freely available.
Desarrollo
The development of AnCora-CO-Es has been funded by the following projects: PRAXEM (HUM2006-27378-E) and Lang2World (TIN2006-15265-C06-06) from the Spanish Ministry of Education and Science.
Publicaciones
- Recasens, M., M.A.Martí, M. Taulé (2008) First-mention Definites: More than Exceptional Cases, S.Featherson & S. Winkler (eds), Fruits: Process and Product in Empirical Linguistics. Berlin: de Gruyter.
- Recasens, M. (2008) Towards Coreference Resolution for Catalan and Spanish. Master Thesis. Universitat de Barcelona.
- Recasens, M., M. A. Martí i M. Taulé (2007) ‘Where Anaphora and Coreference Meet. Annotation in the CESS-ECE Corpus’. Recent Advances in Natural language Processing. Borovets, Bulgaria