Train a Logic Classifier

6 min readFeb 16, 2021

The model below this article depicts the architecture for a multilayer perceptron network designed specifically to solve the XOR problem, but all the remaining 15 logic-problems too. This I want to show with all the solutions of the FANN Library implementation in 3 steps: Build, Train and Test the NN.

A detailed report of this Classifier is available at: http://www.softwareschule.ch/download/maxbox_starter56.pdf

Fast Artificial Neural Network (FANN) Library is a free open source neural network library, which implements multilayer artificial neural networks in C , Pascal or Python with support for both fully connected and sparsely connected networks.

Once the FANN network is trained, the output unit should predict the output you would expect if you were to parse the input values through an XOR logic gate. That is, if the two input values are not equal (1,0 or 0, 1), the output should be 1, otherwise the output should be 0. As we know the XOR logic is just one of sixteen possibilities, see below.

By the way in reinforcement learning is all about making decisions sequentially. In simple words we can say that the output depends on the state of the current input and the next input depends on the output of the previous input. Ok., our all logic table for training has the following input with the most important in bold:

All Boolean Functions
01. 00000000 01 False
02. 00001000 02 AND
03. 00000010 03 Inhibit
04. 00001010 04 Prepend
05. 00000100 05 Praesect
06. 00001100 06 Postpend
07. 00000110 07 XOR
08. 00001110 08 OR
09. 11110001 09 NOR
10. 11111001 10 Aequival
11. 11110011 11 NegY
12. 11111011 12 ImplicatY
13. 11110101 13 NegX
14. 11111101 14 ImplicatX
15. 11110111 15 NAND
16. 11111111 16 True

Before we build the NN with 3 layers here I show the implementation of the boolean logic in a procedure which is the base of the training-set, in other words, it has a positive effect on behavior:

Procedure AllBooleanPattern(aX, aY: integer); 
begin 
Writeln(#13#10+'************** All Booolean Functions **************'); PrintF('%-26s 01 False',[inttobinbyte(0)]) 
PrintF('%-26s 02 AND',[inttobinbyte(aX AND aY)]) 
PrintF('%-26s 03 Inhibit',[inttobinbyte(aX AND NOT aY)]) 
PrintF('%-26s 04 Prepend',[inttobinbyte(aX)]) 
PrintF('%-26s 05 Praesect',[inttobinbyte(NOT aX AND aY)]) 
PrintF('%-26s 06 Postpend',[inttobinbyte(aY)]) 
PrintF('%-26s 07 XOR',[inttobinbyte(aX XOR aY)]) 
PrintF('%-26s 08 OR',[inttobinbyte(aX OR aY)]) 
PrintF('%-26s 09 NOR',[inttobinbyte(NOT(aX OR aY))]) 
PrintF('%-26s 10 Aequival',[inttobinbyte((NOT aX OR aY)AND(NOT aY OR aX))]) 
PrintF('%-26s 11 NegY',[inttobinbyte(NOT aY)]) 
PrintF('%-26s 12 ImplicatY',[inttobinbyte(aX OR NOT aY)]) 
PrintF('%-26s 13 NegX',[inttobinbyte(NOT aX)]) 
PrintF('%-26s 14 ImplicatX',[inttobinbyte(NOT aX OR aY)]) 
PrintF('%-26s 15 NAND',[inttobinbyte(NOT(aX AND aY))]) 
PrintF('%-26s 16 True',[inttobinbyte(NOT 0)]) 
end;

Now we want to build the NN and his architecture. This configuration should be a manual process, something based on experiences and it should be noted that the results below are for one specific model and dataset. The ideal hyper-parameters for other models and datasets will differ.:

NN:= TFannNetwork.create(self) with NN do begin 
{Layers.Strings:= ('2' '3' '1') } 
Layers.add('2') Layers.add('3') Layers.add('1') 
LearningRate:= 0.699999988079071100 
ConnectionRate:= 1.000000000000000000 
TrainingAlgorithm:= taFANN_TRAIN_RPROP 
ActivationFunctionHidden:= afFANN_SIGMOID 
ActivationFunctionOutput:= afFANN_SIGMOID 
//Left := 192 //Top := 40 end

The learning rate are the steps between and is a hyper-parameter that controls how much we are adjusting the weights of our network with respect the loss gradient as a difference or delta parameter. The lower the value, the slower we travel along the downward slope. Learning rate is applied every time the weights are updated via the learning rule; thus, if learning rate changes during training (which we don’t), the network’s evolutionary path toward its final form will immediately be altered. Connection rate of 1 means simply fully connected between neurons. Then we train the model based on our logic table with 5000 epochs:

procedure TForm1btnTrainClick(Sender: TObject); 
var inputs: array [0..1] of single; 
outpts: array [0..0] of single; 
//outputs: array of single; 
e,i,j: integer; 
mse: single; 
begin //Train the neural network epochs 
MemoXOR.Lines.Clear; 
for e:= 1 to TRAIN_EPOCHS do begin 
//Train n epochs epochs.Position:= e div 1; 
for i:= 0 to 1 do begin 
   for j:= 0 to 1 do begin 
     inputs[0]:=i; inputs[1]:=j; 
     case JStrUpper(trim(edtrule.text)) 
        of 'AND' : outpts[0]:= (i And j); 
           'NAND': outpts[0]:= 1-(i And j); 
           'OR' : outpts[0]:= (i Or j); 
           'XOR' : outpts[0]:= (i XOr j); 
           'NOR' : outpts[0]:= 1-(i Or j); 
           'IMP' : outpts[0]:= (i Or (1-j)); 
           'AEQ' : outpts[0]:= 1-(i XOr j); 
           'TAU' : outpts[0]:= 1; 
              else outpts[0]:= (i XOr j); 
     end; 
     Mse:=NN.Train(inputs,outpts); 
     if e mod 4 = 0 then lblMse.Caption:=Format('%.4f',[Mse]);
     Application.ProcessMessages; 
   end; 
end; 
if e mod 10 = 0 then 
  MemoXor.Lines.Add(Format('%d error log = %.4f',[e, MSE])); 
end; 
ShowMessage('Network Epoch '+itoa(TRAIN_EPOCHS)+' Training Ends...'); Writeln('Network Epoch '+itoa(TRAIN_EPOCHS)+' Training Ends...'); 
end;

As you can see the XOR is one of the 8 above logic functions (for sake of overview we cant see all 16 cases). The first thing we’ll explore is how learning rate affects model training. In each run the same model is trained from scratch, varying only the optimizer and learning rate. We test that with the MSE, Mse:=NN.Train(inputs,outpts);

In Statistics, Mean Square Error (MSE) is defined as Mean or Average of the square of the difference between actual and estimated values. In each run, the network is trained until it achieves at least 96% train accuracy with a low MSE.

This configuration now is not a manual process, but rather something that occurs automatically through a process known as backward propagation. Backward propagation considers the error of the prediction made by forward propagation and parses those values backwards through the network, adjusting the weights to values that slightly improve prediction accuracy.

https://www.mlopt.com/?tag=xor

Forward and backward propagation are repeated for each training example in the dataset many times until the weights of the network are tuned to values that result in forward propagation producing accurate output in the following test routine (run means test routine).

procedure TForm1btnRunClick(Sender: TObject); 
var i,j, perfcnt: integer; 
perf1: single; inputs: array [0..1] of single; 
//output: array of fann_type; 
aoutput: TFann_Type_Array3; 
begin MemoXOR.Lines.Clear; 
  setlength(aoutput, 4) 
  perf1:= 0.0; perfcnt:= 0; 
  //NN.Run(inputs,aoutput); 
  for i:=0 to 1 do begin 
    for j:=0 to 1 do begin 
      inputs[0]:=i; inputs[1]:=j; 
      // test and predict 
      NN.Run4(inputs,aoutput); 
      MemoXor.Lines.Add(Format('%d '+JStrUpper(trim(edtrule.text)) 
                                   +' %d = %.3f',[i,j,aOutput[0]])); 
      //writeln(floattostr(nn.learningmometum))       if aOutput[0] > PERF_GATE then begin 
         perf1:= perf1 + aOutput[0]; 
         inc(perfcnt) 
      end; 
    end; 
  end; 
 writeln('Test Score: '+floattostr(perf1 / perfcnt)) 
 MemoXor.Lines.Add(Format('TScore: %.5f',[perf1 / perfcnt])); 
end;

So the score is a simple accuracy based on a threshold with a const: PERF_GATE = 0.85; Any value of 0.85 or higher is deemed to predict 1, while anything lower than 0.85 is deemed to predict 0. This is compared to the actual expected output and the proportion of correct predictions (prediction accuracy) is returned to the console. The first three parameters of the Multilayer Perceptron constructor define the dimensions of the network. In this case we have defined two input units, three hidden units and one output unit, as is required for this architecture.

Build NN with 3 layers:
NN 0: with 2
NN 1: with 3
NN 2: with 1

Also possible is to to call direct the functions via DLL for example:

Const TRAIN_EPOCHS = 5000; 
      PERF_GATE = 0.85; 
function fann_get_total_neurons: longint; 
             external 'fann_get_total_neurons@fannfloat.dll stdcall'; 
function fann_print_parameters: Longint; 
             external 'fann_print_parameters@fannfloat.dll stdcall';

The script you can found at:

http://www.softwareschule.ch/examples/fanndemo.txt

As the earlier results show, it’s crucial for model training to have an good choice of optimizer and learning rate. Manually choosing these hyper-parameters is time-consuming and error-prone.

More about FANN: http://leenissen.dk/fann/wp/

Originally published at http://maxbox4.wordpress.com on February 16, 2021.

Train a Logic Classifier

Written by Max Kleiner