María Nieves Fernández Formoso
Fco. Mario Barcala Rodríguez
Jorge Graña Gil
LIBNAFDA is a C library which allows to manage large dictonaries of many kinds efficiently and minimizing memory consumption. For this purpose we use numbered acyclic deterministic finite-state automata, and we understand as dictionary, in this context, any structure that can link dictionary entries (words) which their information.
LIBNAFDA is a C library where you can manage, efficiently and minimizing memory consumption, large dictionaries of many kinds. For this purpose it uses numbered acyclic deterministic finite-state automata. In this work we assume that dictionary means any structure that would associate its entries (words) to any kind of information. For the development of this library we have followed two maxims:
- To minimize memory consumption needed to store the dictionaries.
- To minimize the running time for them to be used in systems that require a high performance.
The library consists of two parts: one part is used to build the compiler, which deals with the task of compiling or compressing the words dictionaries, and the other part is responsible for facilitating access to these compiled dictionaries.
The compiler needs a list of words and the information associated with them to generate the compressed dictionaries. From this information, it generates a compiled dictionary (compressed) in binary format, which can be accessed by any independent program throw the second part of the library.
The key features that differentiate this library from other existing proposals are:
- You can store any type of information associated with words inside it. Actually, dictionaries store integers and/or floats, but since these integers can be interpreted as indexes to any structure external to the library, they may reference data of any type, including strings.
- It enables the simultaneous management of multiple dictionaries at a time and with references between them. Because the integer data can be interpreted in different ways, a particular case is to consider them as indexes to other dictionaries, which allows several dictionaries to be connected.
LIBNAFDA is imlmeented in C.
For words storage the library uses a numbered acyclic deterministic finite-state automaton, which is built by the compiler using the automata building algorithm proposed by Jan Daciuk in his article: Incremental Construction of Minimal Acyclic Finite-State Automata. Therefore, the automaton is built in an incremental and minimal way, and the use of memory and word recognition speed are optimized.
To compile the library sources the following packages must be installed:
- libxml2 2.6 or higher
- libxml2-dev 2.6 or higher
- libglib 2.16 or higher
We have combined the building principles of minimal automata proposed by Jan Daciuk and others in his paper Incremental Construction of Minimal Acyclic Finite-State Automata, with the concepts raised by Jorge Graña Gil and others to manage the information associated with words in Compilation Methods of Minimal Acyclic Finite-State Automata for Large Dictionaries. The result is an useful library for environments which need a very efficient access to information associated with words.
Jorge Graña, Fco. Mario Barcala, and Miguel A. Alonso, Compilation Methods of Minimal Acyclic Finite-State Automata for Large Dictionaries, in Bruce W. Watson and Derick Wood (eds.), Implementation and Application of Automata, volume 2494 of Lecture Notes in Computer Science, pp. 135-148, Springer-Verlag, Berlin-Heidelberg-New York, 2002. ISSN 0302-9743 / ISBN 3-540-00400-9.
Alejandro Sobrino, Santiago Fernández, and Jorge Graña, Access to a large dictionary of Spanish synonyms: a tool for fuzzy information retrieval, in Enrique Herrera-Viedma, Gabriella Pasi and Fabio Crestani (eds.), Soft computing in web information retrieval: models and applications, volume 197 of Studies in Fuzziness and Soft Computing, pp. 299-316, Springer-Verlag, Berlin-Heidelberg-New York, 2006. ISSN 1434-9922 / ISBN 3-540-31588-8.
Santiago Fernández, Jorge Graña, and Alejandro Sobrino, Introducing FDSA (Fuzzy Dictionary of Synonyms and Antonyms): Applications on Information Retrieval and Stand-Alone Use, Mathware & Soft Computing, 10(2-3):57-70, 2003. ISSN 1134-5632.