Eihera is a system for Named Entity recognition and classification in written Basque. The system is designed in four steps: first, the development of a recognizer based on linguistic information represented on finite-state-transducers; second, the generation of semi-automatically annotated corpora from the result of these transducers; third, the achievement of the best possible recognizer by training different ML techniques on these corpora; and finally, the combination of the different recognizers obtained.
Included in Zatiak
Finite-state and Machine learning.
Recognition by rules, recognition by ML, classification by rules, classification by ML. Eustagger is a previous step.
It is the first NERC system for Basque.
Different projects funded by the Basque government and the Spanish R&D agency.
- Alegria I., Arregi O., Ezeiza N., Fernandez I., Urizar R. Design and Development of a Named Entity Recognizer for an Agglutinative Language. First International Joint Conference on NLP (IJCNLP-04). Workshop on Named Entity Recognition. 2004.
- Alegria I., Ezeiza N., Fernandez I., Urizar R. Named Entity Recognition and Classification for texts in Basque. II Jornadas de Tratamiento y Recuperación de Información, JOTRI, Madrid. 2003. ISBN 84-89315-33-7. 2003.