ADQA (Arabic Definition Question Answering) corpus | ADQA Corpus – Arabic Definition Question Answering corpus. This corpus is constituted of a list of 50 definition questions (ArabicListDefQuest), a set of 50 files... |
Amazon Data Sets | This corpus has been created in order to study the figurative language, especially irony, sarcasm and humour, in a context focused on sentiment analysis. It contains approx.... |
AnCora-Ca | AnCora-Ca is a multilevel annotated corpus of Catalan, consisting of 500,000 words mostly from newspaper articles. AnCora-Ca is annotated with morphological (PoS), syntactic... |
AnCora-CO-Ca | AnCora-CO-Ca is a subset of the multilevel annotated corpus AnCora-Ca (for Catalan), consisting of 400,000 words, enriched with coreference information, where... |
AnCora-CO-Es | AnCora-CO-Es is a subset of the multilevel annotated corpus AnCora-Es (for Spanish), consisting of 400,000 words, enriched with coreference information, where... |
AnCora-DEP-Ca | AnCora-DEP-Ca is the AnCora-Ca multilevel annotated corpus of Catalan in dependency-based representation, consisting of 500,000 words approximately. |
AnCora-DEP-Es | AnCora-DEP-Es is the AnCora-Es multilevel annotated corpus of Spanish in dependency-based representation, consisting of 500,000 words approximately. |
AnCora-Es | AnCora-Es is a multilevel annotated corpus of Spanish, consisting of 500,000 words mostly from newspaper articles. AnCora-Es is annotated with morphological (PoS), syntactic... |
AnCora-Verb-Ca | AnCora-Verb-Ca is a verbal lexicon containing 2,141diferent verbs. In AnCora-Verb-Ca lexicon, the mapping between syntactic functions, arguments and thematic roles of each... |
AnCora-Verb-Es | AnCora-Verb-Es is a verbal lexicon containing 2,603 different verbs. In AnCora-Verb-Es lexicon, the mapping between syntactic functions, arguments and thematic roles of each... |
ANERcorp | ANERcorp is an Arabic NER corpus which consists of 150,000 tokens (which go up to 200,000 tokens after segmentation). |
ANERgazet | ANERgazet is a set of 3 Arabic gazetteers (people, locations and organizations) which might be used mainly for the Arabic NER task, but still can be used for other Arabic NLP... |
Arabic QA | This corpus includes Spanish journalistic texts, more precisely, it is a collection of news extracted from El Periódico de Catalunya. It has been manually annotated at a... |
Arabic WordNet | The Arabic WordNet (AWN) is a lexical database of the Arabic language following the development process of Princeton English WordNet and Euro WordNet. It utilizes the Suggested... |
Author Profiling @ PAN-2013 | This corpus consists of ocuments written in both English and Spanish. With regard to age, we will consider posts of three classes: 10s (13-17), 20s (23-27), and 30s (33-47).... |
Author Profiling @ PAN-2014 | Twitter tweets and social media texts written in both English and Spanish as well as hotel reviews written in English. With regard to age, we will consider the following... |
Blogs Analysis corpus | The corpus is integrated by 8 sets. Every set contains 2,400 documents automatically retrieved from LiveJournal and Wikipedia. The corpus is organised as follows: i) The [mfs]... |
Blogs Clustering Corpus | This is a set of corpora made up of discussion lines extracted from two blogs websites: boing-boing and slashdot. |
CESCA | CESCA is a Catalan corpus consisting of scholar writing text elaborated by 2,400 scholars between the ages of five and sixteen. Each informant has written different types of... |
CICLing-2002 Clustering Corpus | This a pre-processed version of 48 scientific abstracts from the CICLing 2002 conference (computational linguistics). |