Edlin: an easy to read linear learning framework
Machine learning with linear models made legible

The Edlin toolkit provides a machine learning framework for linear models, designed to be easy to read and understand. The main goal is to provide easy to edit working examples of implementations for popular learning algorithms. The toolkit is very brief, consisting of 40 Java classes with a total of about 2200 lines of code, of which about 20% are I/O and driver classes for examples. A version of Edlin has been integrated as a processing resource for the GATE architecture, and has been used for gene tagging, gene name normalization, named entity recognition in Bulgarian and biomedical relation extraction.
Edlin is "open source" and released under the terms of GNU General Public License. Of course, an acknowledgement is always a good idea - cite Edlin.

Highlights

Education: Easy to understand, hands on experience with working implementation.
Research: Easy to modify, easy to run experiments with modified learning algorithms.
Industry: State of the art learning, GATE integration, limited dependence on libraries.

Truth in Advertising

Edlin is designed with a programmer in mind. Edlin is not a replacement for other toolkits such as Mallet, Weka, NLTK, LingPipe which target mostly end-users of learning algorithms.

  • No graphical user interface;

Edlin does not have any GUI: the programmer is intended to read and edit the code. A user who wants to try some learning algorithms on their data without programming would be better served by e.g. Weka.

  • I/O and feature construction are left to the user;

Edlin does not have a feature generation pipeline similar to Mallet's. The user has to implement feature extraction.

Brief, Legible Implementation

As an example, we have an implementation of the Perceptron algorithm, with and without averaging (training procedure is 13 lines of code). Look at our implementation in the cross referenced source here. The number of lines of code for some learning algorithms and implementations are shown below.

Learning Algorithm Edlin LOC
Perceptron
Naive Bayes
Maximum Entropy
MIRA
AdaBoost
CRF
struct. Perceptron
struct.MIRA
43
61
97
79
141
182+68
37+68
77+12+68
Counts of lines of code (excluding comments and blank lines) for implementations of different learning algorithms in Edlin. Click on the Learning Algorithm name to get the cross referenced source.