Autores: | Yassine Benajiba (Ph.D. student) and Paolo Rosso |
URL: | http://www.dsic.upv.es/grupos/nle/ |
Contacto: | Yassine Benajiba <benajibayassine |
Descripción
A Named Entity Recognition model which is trained using an SVM-based approach over a 125,000 Arabic tokens training file.
Funcionalidad
The model allows the user to extract the named entities with an opn-domain text and classify them into 4 different categories, namely: person, location, organization and miscellaneous. In order to enhance the performance, the model was trained over ATB segmented data which helps to decrease the sparseness in Arabic data.
Tecnología
The model is trained using Support Vector Machines approach with the Yamcha Toolkit (http://chasen.org/~taku/software/yamcha/).
Requisitos técnicos
The input file should be ATB segmented and transliterated to Romanized characters. Also it requires Yamcha to be installed in the machine.
Módulos
One module which consists of basic decoding on the data provided by the user.
Innovación
To our knowledge, no Arabic NER systems are freely available for the research community. The model has been tested and the results have been presented at EMNLP and ACIT conferences.
Desarrollo
Developed as part of Yassine Benajiba’s AECI Ph.D. and the MiDES CICYT TIN2006-15265-C06-04 research project, co-funded by the AECI-PCI A01031707 project.
Publicaciones
- Benajiba Y., Diab M., Rosso P. Arabic Named Entity Recognition using Optimized Feature Sets. In: Proc. Int. Conf. on Empirical Methods in Natural Language Processing, EMNLP-2008, Waikiki, Honolulu, U.S.A., October, 2008.
- Benajiba Y., Diab M. Rosso P. Arabic Named Entity Recognition: An SVM-based approach. In: Proc. Int. Arab Conf. on Information Technology, ACIT-2008, Hammamet, Tunisia, December, 2008.