Autores: | IXA group |
URL: | http://ixa.si.ehu.es/Ixa/resources/Treebank |
Contacto: | Iñaki Alegria |
Descripción
EPEC is a corpus of standard written Basque that has been manually tagged at different levels (morphology, surface syntax, phrases) and is currently being hand tagged at deep syntax level following the Dependency Structure-based Scheme. It is aimed to be a “reference” corpus for the development and improvement of several NLP tools for Basque. This corpus has already been used for the construction of some tools such as a morphological analyser, a lemmatiser, or a shallowsyntactic analyser. The EPEC-DEP corpus is the EPEC (Reference Corpus for the Processing of Basque) corpus manually tagged with dependency relations. Part of this work was developed in the CESS-ECE project (HUM2004-21127). Since in this project the constituents based syntactic formalism was chosen for consulting the corpus of all languages, the conversion from the dependencies to the constituents had to be done. In this way, it is possible to get the EPEC corpus tagged either with dependency relations or with constituent relations.
Funcionalidad
Tecnología
Requisitos técnicos
Módulos
Innovación
First Basque corpus manually tagged at different levels (morphology, surface syntax, phrases).
Desarrollo
Publicaciones
- Aduriz I., Aranzabe M., Arriola J., Atutxa A., Díaz de Ilarraza A., Ezeiza N., Gojenola K., Oronoz M., Soroa A., Urizar R. 2003. Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing. Proceedings of the Corpus Linguistics 2003. Lancaster.
- Aldezabal I., Aranzabe M.J., Arriola J.M., Díaz de Ilarraza A., Estarrona A., Fernandez K., Uria L., Quintian M.. 2007. EPEC (Euskararen Prozesamendurako Erreferentzia Corpusa) dependentziekin etiketatzeko eskuliburua. UPV/EHU / LSI / TR 12-2007