BoolLearnLib
Learning Boolean Functions from Positive Examples for Python
This library implements a greedy algorithm for learning boolean functions from positive examples.
The algorithm takes a sample S = {s_1,..., s_m} of observed behaviors and wants to come up with an approximate characterization of the possible behaviors of the system.
The sample S consists of positive examples which are uniformly selected from the target function 'f', i.e., S = {s : f (s) = 1, s ∈ B^n}. The target function 'f', which is unknown, is a boolean function f : B^n → {0, 1}.
A learning algorithm (the learner) wants to approximate 'f' by a good hypothesis function 'h', which is represented by k boolean terms in Disjunctive Normal Form (DNF). 'h' and k are the outputs of the learning algorithm. This library is based on the work presented in [1].
[1] [Toward Learning Boolean Functions from Positive Examples; Nicolas Basset, Oded Maler and Irini-Eleftheria Mens, in Proceedings of LearnAut 2018.〈preprint〉] (url_pending)
Installation
This library requires Python 2.7.9. The dependencies are listed in requirements.txt. If you have the python tool 'pip' installed, then you can run the following command for installing the depencencies:
$ pip install -r requirements.txt
Afterwards you need to run:
$ python setup.py build
$ python setup.py install
for installing the library. In order to run all the tests, you must execute:
$ python setup.py test
or
$ cd Tests
$ pytest
from the root of the project folder.
For users that don’t have write permission to the global site-packages directory or do not want to install our library into it, Python enables the selection of the target installation folder with a simple option:
$ pip install -r requirements.txt --user
$ python setup.py build
$ python setup.py install --user
In Unix/Mac OS X environments:, the structure of the installation folders will be the following:
Type of file | Installation directory |
---|---|
modules | userbase/lib/pythonX.Y/site-packages |
scripts | userbase/bin |
data | userbase |
C headers | userbase/include/pythonX.Y/distname |
And here are the values used on Windows:
Type of file | Installation directory |
---|---|
modules | userbase\PythonXY\site-packages |
scripts | userbase\Scripts |
data | userbase |
C headers | userbase\PythonXY\Include\distname |
The advantage of using this scheme compared to the other ones described in [2] is that the user site-packages directory is under normal conditions always included in sys.path, which means that there is no additional step to perform after running the setup.py script to finalize the installation.
[2] Installing Python Modules (Legacy version) (https://docs.python.org/2/install/)
Running
Definition of the Sample
The learning algorithm requires a sample S of positive examples from the target function 'f' for inferring the hypothesis function 'h' that will approximate it.
In machine learning contexts, the sample S is usually partitioned into two parts. The first part, called the training sample, is used for training (learning a hypothesis), and the second part, called the testing sample is used for evaluation of the hypothesis.
Often, training part and a testing part consists of the 30% and 70% of the sample points, respectively. The learning algorithm is then applied several times on the same sample but on different partitions of it, in order to guaranty a smaller error.
The package BoolLearnLib.Samples includes a set of functions for generating terms, composing them in a DNF and, later on, feeding them to the learning function for training and testing.
Running the Learning Function
The core of the library is the algorithm learning an approximation of the target function 'f'. The learning algorithm is implemented inside the BoolLearnLib.Table class. The learning algorithm stops when any of these two conditions are met:
- the prediction error of the learnt function 'h' (percentage of false negatives) is below an epsilon,
- the 'h' representation reaches k terms.
A set of running examples can be found in Test/test_*.py.
from BoolLearnLib.Table import *
from BoolLearnLib.Samples import *
# Define the Sample
# Sample contains 5 terms that are summarized by the expression '11*0**', where * is a wild card representing
# either 0 or 1.
s = sample_from_rectangle('11*0**', 8)
training_sample = s[1:3]
testing_sample = s[4:8]
# Initialize the learning function
T = Table(training_sample, cost_function='dist')
# The learning function stops when 'h' the number of falses positives using the testing_sample is below an error 'e'
T.elearn(testing_sample, e=0.1)
# Result
print T.terms
# Initialize the learning function
T = Table(training_sample, cost_function='dist')
# The learning function stops when 'h' reaches k = 1 terms
T.klearn(1)
# Result
print T.terms