Blogs Analysis corpus

Autores:	Antonio Reyes
URL:	http://users.dsic.upv.es/grupos/nle/?file=kop4.php
Contacto:	Paolo Rosso <prossodsic.upv.es>

Descripción

The corpus is integrated by 8 sets. Every set contains 2,400 documents automatically retrieved from LiveJournal and Wikipedia. The corpus is organised as follows: i) The [mfs] versions contain the documents labelled with POS tags and the mosf frequent sense according to WordNet. ii) The [xml] versions contain the sets converted into the Senseval-2 formatted XML. The corpus has been designed for analysing humour features in the Blogosphere.

Funcionalidad

It allows carrying out experiments on Automatic Humour Recognition.

Tecnología

Requisitos técnicos

Módulos

Innovación

Documents retrieved from LiveJournal and Wikipedia for Automatic Humour Recognition.

Desarrollo

MICINN research project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I+D+i). Developed as part of the Ph.D. Thesis of Antonio Reyes (writing-up phase).

Publicaciones

Reyes A., Rosso P., Buscaldi D. Finding Humour in the Blogosphere: The Role of WordNet Resources. In: Proc. 5th Global WordNet Int. Conf., GWN-2010, Bombay, India, January 31-February 4, 2010
Reyes A., Rosso P., Buscaldi D. Affect-based Features for Humour Recognition. In: Proc. 7th Int. Conf. on Natural Language Processing, ICON-2009, Hyderabad, India, December 15-17, pp. 364-369, 2009
Reyes A., Rosso P. Linking Humour to Blogs Analysis: Affective Traits in Posts. In: Proc. 1st Workshop on on Opinion Mining and Sentiment Analysis (WOMSA), CAEPIA-TTIA Conference, Seville, Spain, November 13, pp. 205-212, 2009

Red Temática en Tratamiento de la Información Multilingüe y Multimodal (TIMM)

Red Temática en Tratamiento de la Información Multilingüe y Multimodal (TIMM)

Red Temática en Tratamiento de la Información Multilingüe y Multimodal (TIMM)

Índice recursos