OCA Corpus | OCA es un corpus en árabe sobre comentarios de películas. Este corpus ha sido generado a partir de comentarios en árabe obtenidos de diferentes páginas web que se muestran... |
Opinion analysis corpus | The corpus contains 3,000 opinions on the domain of tourism. These opinions have been obtained from the TripAdvisor blog. |
SENSEM Corpus | This corpus includes Spanish journalistic texts, more precisely, it is a collection of news extracted from El Periódico de Catalunya. It has been manually annotated at a... |
SENSEM Verbal DB | The lexical database contains the most frequent 250 Spanish verbs, a total of 1000 senses. These senses are described from a syntactic and semantic perspective: semantic roles,... |
SINAI SA Corpus | Este corpus ha sido preparado por el grupo SINAI en Diciembre de 2008. SINAI SA (Análisis de Sentimientos) fue creado rastreando la página web de Amazon. Casi 2000... |
Single-label hep-ex Clustering Corpus | This corpus is a pre-processed version of the collection of scientific abstracts compiled by the University of Jaén, Spain named hep-ex [1]. |
Social-ODP-2k9 | Social-ODP-2k9 is a dataset created during December 2008 and January 2009 with data retrieved from the social bookmarking sites Delicious and StumbleUpon, the Open Directory... |
SoCo corpus | Este corpus pertenece a la competición internación SOCO en detección de reutilización de código fuente que se celebra en el forum internación FIRE2014. Consiste en... |
Spanish QC | Este recurso son 6305 preguntas en español etiquetadas para clasificación de Búsqueda de Respuestas, siguiendo la taxonomía definida en el artículo “X. Li and D. Roth.... |
Spanish WordNet 3.0 | An open-source lexical and semantic resource for Spanish that has been created from the latest version of the English WordNet (3.0) and connected with it through the ID and the... |
Taxonomy-Based Opinion Dataset | This dataset contains annotated reviews for three different domains: cars, headphones and hotels. Opinions are annotated at the feature level, with the following... |
The Arabic Wikipedia XML corpus | The 30 most frequent categories of the Arabic Wikipedia XML corpus gathered by Ludovic Denoyer and Patrick Gallinari were selected in order to provide a testbed for the... |
The KnCr clustering corpus | This is a new narrow-domain short text corpus in the medicine domain which was constructed by downloading the last sample of documents provided in MEDLINE and selecting only... |
Twitter Hash tags Corpus | Corpus containing 50,000 textes extracted from Twitter. Each text contains an hash tag depending on the topic: #humor, #irony, #politics, #technology, #education |
Volem | VOLEM (Verbs: Multilingual Lexical Organization) is a lexical multilingual data base of a subset of Spanish, Catalan and French verbs. In this multilingual resource,... |
Wiki10+ | Wiki10+ is a dataset created during April 2009 with data retrieved from the social bookmarking site Delicious and Wikipedia. It is made up by 20,764 articles of the English... |