Autores: | Grzegorz Chrumpala, Ana Fernández, Elisabet Comelles, Marta Coll-Florit, Glòria Vàzquez, Nerea Achutegui , Irene Castellón, Mercè Coll, Marta Prim. |
URL: | http://grial.uab.es / |
Contacto: | A. M. Fernández Montraveta <ana.fernandezuab.es> |
Descripción
This corpus was developed as part of a project for teaching innovation whose objective was the improvement of the processes for teaching/learning technical English by using a new and different perspective from the traditional language class, by means of the creation of a parallel corpus. This corpus is a collection of texts available in different languages (English-Catalan-Spanish), which includes approximately 2.257.498 words and exemplifies the usage of the language within a technical register, more specifically, in the computer science domain. This corpus is very useful for classes since it can be used as a dynamic resource in the teaching, both for the teacher and for the student. For the teacher, it can be used as a source of material for the creation of exercises, texts for the classes and examples. It can also be used as a guide to develop the class syllabus. As the corpus is annotated morphologically and syntactically, this resource is transformed into a very useful instrument for the language class, because it allows us to perform searches not only of collocations but also of words with a given category, or even of lemmas.
Funcionalidad
The interface allows the simple search (form), and the advanced search (form, lemma and POS) and the output is showed in the three languages.
Tecnología
Wrapper program (Java class) which uses external annotation software (FreeLing and Connexor) to perform annotation of the aligned text while preserving the alignement. It then formats and stores annotated segments in a specialized CPG XML format. eXist database which stores the XML-encoded text. XQuery scripts implementing specialized search functions. There are two basic search modes: quicksearch, for searching by sequences of word forms, and fullsearch for searching by sequences of token specifications including the token form, lemma and morphological annotation. The Web User Interface, in whose implementation use is made of all the major web techologies: XQuery, XSLT, CSS, XHTML and Ecmascript. The MVC Cocoon web application framework bundled with eXist is used to implement this component.
Requisitos técnicos
Módulos
Innovación
Using computational linguistics techniques for the production of materials for second language learning.
Desarrollo
The trilingual corpus has been developed in the project: Millora de la qualitat docent de la Generalitat de Catalunya (194 MQD 2002).
Publicaciones
- Castellón, I., A. Fernández, G. Vázquez (2005) “Creación de un recurso textual para el aprendizaje del inglés”, NOVATICA Revista de la Asociación de técnicos de informática, 177, p. 51-54. ISSN: 0211-2124