The Edlin toolkit provides a machine learning framework for linear models, designed to be easy to read and understand. The main goal is to provide easy to edit working examples of implementations for popular learning algorithms. The toolkit is very brief, consisting of 40 Java classes with a total of about 2200 lines of code, of which about 20% are I/O and driver classes for examples. A version of Edlin has been integrated as a processing resource for the GATE architecture, and has been used for gene tagging, gene name normalization, named entity recognition in Bulgarian and biomedical relation extraction.
Edlin is "open source" and released under the terms of GNU General Public License. Of course, an acknowledgement is always a good idea - cite Edlin.
Education: Easy to understand, hands on experience with working implementation.
Research: Easy to modify, easy to run experiments with modified learning algorithms.
Industry: State of the art learning, GATE integration, limited dependence on libraries.
Edlin is designed with a programmer in mind. Edlin is not a replacement for other toolkits such as Mallet, Weka, NLTK, LingPipe which target mostly end-users of learning algorithms.
Edlin does not have any GUI: the programmer is intended to read and edit the code. A user who wants to try some learning algorithms on their data without programming would be better served by e.g. Weka.
Edlin does not have a feature generation pipeline similar to Mallet's. The user has to implement feature extraction.
As an example, we have an implementation of the Perceptron algorithm, with and without averaging (training procedure is 13 lines of code). Look at our implementation in the cross referenced source here. The number of lines of code for some learning algorithms and implementations are shown below.
Learning Algorithm | Edlin LOC |
Perceptron Naive Bayes Maximum Entropy MIRA AdaBoost CRF struct. Perceptron struct.MIRA |
43 61 97 79 141 182+68 37+68 77+12+68 |