Autores: | Yassine Benajiba (Ph.D. student) and Paolo Rosso. |
URL: | http://www.dsic.upv.es/grupos/nle/ |
Contacto: | Yassine Benajiba <benajibayassine |
Descripción
ANERgazet is a set of 3 Arabic gazetteers (people, locations and organizations) which might be used mainly for the Arabic NER task, but still can be used for other Arabic NLP tasks.
Funcionalidad
Each gazetteer contains a list of Arabic names belonging to the concerned class.
Tecnología
The gazetteers were extracted automatically from Arabic Wikipedia and the Web resources and then manually filtered.
Requisitos técnicos
None.
Módulos
Innovación
To our knowledge, ANERgazet is the only Arabic gazetteers which are freely available to the research community.
Desarrollo
Developed as part of Yassine Benajiba’s AECI Ph.D. and the MiDES CICYT TIN2006-15265-C06-04 research project, co-funded by the AECI-PCI A01031707 and A706706 projects.
Publicaciones
- Benajiba Y., Diab M., Rosso P. Arabic Named Entity Recognition using Optimized Feature Sets. In: Proc. Int. Conf. on Empirical Methods in Natural Language Processing, EMNLP-2008, Waikiki, Honolulu, U.S.A., October, 2008.
- Benajiba Y., Rosso P. Arabic Named Entity Recognition using Conditional Random Fields. In: Proc. Workshop on HLT & NLP within the Arabic world. Arabic Language and local languages processing: Status Updates and Prospects, 6th Int. Conf. on Language Resources and Evaluation, LREC-2008, Marrakech, Morocco, May 26-31, 2008.
- Benajiba Y., Diab M. Rosso P. Arabic Named Entity Recognition: An SVM-based approach. In: Proc. Int. Arab Conf. on Information Technology, ACIT-2008, Hammamet, Tunisia, December, 2008.
- Benajiba Y., Rosso P., Benedí J.M. ANERsys: An Arabic Named Entity Recognition system based on Maximum Entropy. In: Proc. 8th Int. Conf. on Comput. Linguistics and Intelligent Text Processing, CICLing-2007, Springer-Verlag, LNCS(4394), pp. 143-153, 2008