Córpora, Bases de Datos y otros Recursos Lingüísticos


EmotiCorpus

This resource is an annotated corpus of quotes from the Italian wikiquote collection.

English-Spanish dictionary of weighted morphological forms

This dictionary contains an exhaustive list of forms weighted according to the distributions of corresponding grammar classes in reference corpora.

Enriched List of Questions in Arabic

Set of TREC and CLEF questions in Arabic enriched with a query expansion process. These questions have been expanded using an Arabic WordNet-based semantic Query Expansion...

EPEC-DEP

EPEC is a corpus of standard written Basque that has been manually tagged at different levels (morphology, surface syntax, phrases) and is currently being hand tagged at deep...

EPEC-Eusemcor

Eusemcor is a hand annotated corpora for Basque (the Basque Semcor). This joint development allows for better motivated sense distinctions, and a tighter coupling between both...

eSOL

eSOL es una lista de palabras indicadoras de opinión en español dependientes del dominio. El dominio del conjunto de palabras es el de críticas de cine.

Para la...

EuroWordNet

EuroWordNet is a multilingual database with wordnets for several European languages (Dutch, Italian, Spanish, German, French, Czech and Estonian). The wordnets are structured...

EuskalWordnet

The Basque WordNet follows the EuroWordNet framework and, basically, it is produced using a semi-automatic method that links Basque words to the English WordNet. We have found...

EVOCA Corpus

EVOCA (English Version of OCA) es un corpus en inglés generado a partir de la traducción del corpus OCA en árabe. Este corpus contiene comentarios de películas y está...

Features Inventory

File containing the elements to represent all the dimensions regarding signatures features (emoticons, counter-factuality items, temporal compression items).

Geo-WordNet

This is a semi-automatically generated mapping from WordNet 2.0 to geographical coordinates.

Geo-WordNet 3.0

Geo-WordNet 3.0 connects WordNet synsets with their geographical coordinates (latitude and longitude). In the new 3.0 version, the source of geographical data was Geonames (

GeoSemCor2.0

This resource is the SemCor corpus labeled with WordNet 2.0 synsets, enriched with the addition of labels for synsets that are related to geographical entities.

Ironic Quotes

This corpus has been manually created on the basis of the irony tag that users employ in their posts in blogs on the Web. It contains comments related to the irony...

iSOL

iSOL es una lista de palabras indicadoras de opinión en español independiente del dominio.

Para la elaboración del recurso se ha partido de la lista de palabras que...

Lexicon of Prototypical Discourse Markers

This is the seminal discourse marker lexicon used in the thesis Representing discourse for automatic text summarization via shallow NLP techniques. The discourse markers listed...

LibiXaml

It is a framework for creating, browsing and editing linguistic annotations generated by a set of different linguistic processing tools.

MCE Corpus

MuchoCine corpus en Inglés (MCE) es la versión traducida del corpus MuchoCine (

MCR: Multilingual Central Repository

The Multilingual Central Repository (MCR) follows the model proposed by the EuroWordNet project. EuroWordNet (Vossen, 1998) is a multilingual lexical database with wordnets for...

ML-SentiCon: A Layered, Multilingual Sentiment Lexicon

Se trata de varias listas de lemas positivos y negativos para inglés, español, catalán, gallego y vasco. Cada lema viene acompañado de una estimación numérica de su...