Autores: | Luis Alberto Barrón Cedeño (Ph.D. student) and Paolo Rosso. |
URL: | http://www.dsic.upv.es/grupos/nle/downloads.html |
Contacto: | Luis Alberto Barrón Cedeño <lbarrondsic.upv.es> |
Descripción
The CliPA corpus has been created as a resource for the design and test of methods for the automatic detection of cross-lingual plagiarism cases. It contains a set of original text fragments in English and around twelve different plagiarised versions of them in Spanish (Italian will be added soon). The plagiarised text fragments were obtained by both “human plagiarisers” and Machine Translators. In order to create a realistic plagiarism detection environment, the corpus includes a set of text fragments on the same topic but originally written in Spanish.
Funcionalidad
Due to the facts that all the text fragments in the corpus are identified as original or plagiarised and that the plagiarised fragments are linked to their actual source, the corpus can be used to develop and test cross-lingual plagiarism detection methods.
Tecnología
The corpus is codified in XML.
Requisitos técnicos
There are not special requirements to use the corpus. It can be accessed via any XML parser.
Módulos
Innovación
To our knowledge, With respect, CLiPA corpus is the only freely available corpus cross-lingual plagiarism analysis.
Desarrollo
Developed as part of the MiDES CICYT TIN2006-15265-C06-04 research project.
Publicaciones
- Barrón-Cedeño A., Rosso, P., Pinto, D. and Juan, A. On cross-lingual plagiarism analysis using a statistical model. In: Proceedings of the ECAI’08 PAN Workshop: Uncovering Plagiarism, Authorship and Social Software Misuse, pp. 9-13. Patras, Greece, 2008.
- Pinto D., Civera J., Juan A., Rosso P., Barrón-Cedeño A. A statistical approach to crosslingual natural language tasks. In: Proc. 4th Latin American Workshop on Non-Monotonic Reasoning, LANMR-2008, Puebla, Mexico, October 22-24, 2008.
- Pinto D., Civera J., Barrón-Cedeño A., Juan A., Rosso P. A statistical approach to crosslingual natural language tasks (selected and enhanced version; accepted and to be published). In: Journal of Algorithms in Cognition, Informatics and Logic, 2009.