Autores: | Mikhail Alexandrov and Alexander Gelbukh (Instituto Politécnico Nacional, Mexico). Pre-processed by David Pinto; Héctor Jiménez (Universidad Autónoma Metropolitana, México) |
URL: | http://www.dsic.upv.es/grupos/nle/downloads.html |
Contacto: | David Eduardo Pinto Avendaño <dpintocs.buap.mx> |
Descripción
This a pre-processed version of 48 scientific abstracts from the CICLing 2002 conference (computational linguistics).
Funcionalidad
The aim of this corpus is to support experiments of supervised and unsupervised classifiers with narrow domain short texts.
Tecnología
The corpus (raw text) and the gold standard are provided.
Requisitos técnicos
No special requirements are needed in order to use the corpus.
Módulos
Innovación
A very small collection which may be used to manually verify the results obtained in the clustering task of narrow domain short texts.
Desarrollo
Developed as part of David Pinto Ph.D. and the MiDES CICYT TIN2006-15265-C06-04 research project.
Publicaciones
- David Pinto, Alfons Juan, Paolo Rosso: A Comparative Study of Clustering Algorithms on Narrow-Domain Abstracts. Procesamiento del Lenguaje Natural 37(1): 43-49, 2006.
- Héctor Jiménez-Salazar, David Pinto, Paolo Rosso: Uso del Punto de Transición en la Selección de Términos Índice para Agrupamiento de Textos Cortos, Procesamiento del Lenguaje Natural 35(1): 114-118, 2005.
- Diego Ingaramo, David Pinto, Paolo Rosso, Marcelo Errecalde: Evaluation of Internal Validity Measures in Short-Text Corpora. CICLing 2008. Lecture Notes in Computer Science 4919, Springer-Verlag: 555-567, 2008.
- David Pinto, Paolo Rosso: On the Relative Hardness of Clustering Corpora. TSD 2007. Lecture Notes in Artificial Intelligence 4629, Springer-Verlag: 155-161, 2007.
- David Pinto, José-Miguel Benedí, Paolo Rosso: Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance. CICLing 2007. Lecture Notes in Computer Science 4394, Springer-Verlag: 611-622, 2007.
- David Pinto, Héctor Jiménez-Salazar, Paolo Rosso: Clustering Abstracts of Scientific Texts Using the Transition Point Technique. CICLing 2006. Lecture Notes in Computer Science 3878, Springer-Verlag:536-546, 2006.