Train a Logic Classifier

The model below this article depicts the architecture for a multilayer perceptron network designed specifically to solve the XOR problem, but all the remaining 15 logic-problems too. This I want to show with all the solutions of the FANN Library implementation in 3 steps: Build, Train and Test the NN.

A detailed report of this Classifier is available at: http://www.softwareschule.ch/download/maxbox_starter56.pdf

Fast Artificial Neural Network (FANN) Library is a free open source neural network library, which implements multilayer artificial neural networks in C , Pascal or Python with support for both fully connected and sparsely connected networks.

Once the FANN network is trained, the output unit should predict the output you would expect if you were to parse the input values through an XOR logic gate. That is, if the two input values are not equal (1,0 or 0, 1), the output should be 1, otherwise the output should be 0. As we know the XOR logic is just one of sixteen possibilities, see below.

By the way in reinforcement learning is all about making decisions sequentially. In simple words we can say that the output depends on the state of the current input and the next input depends on the output of the previous input. Ok., our all logic table for training has the following input with the most important in bold:

  • All Boolean Functions
  • 01. 00000000 01 False
  • 02. 00001000 02 AND
  • 03. 00000010 03 Inhibit
  • 04. 00001010 04 Prepend
  • 05. 00000100 05 Praesect
  • 06. 00001100 06 Postpend
  • 07. 00000110 07 XOR
  • 08. 00001110 08 OR
  • 09. 11110001 09 NOR
  • 10. 11111001 10 Aequival
  • 11. 11110011 11 NegY
  • 12. 11111011 12 ImplicatY
  • 13. 11110101 13 NegX
  • 14. 11111101 14 ImplicatX
  • 15. 11110111 15 NAND
  • 16. 11111111 16 True

Before we build the NN with 3 layers here I show the implementation of the boolean logic in a procedure which is the base of the training-set, in other words, it has a positive effect on behavior:

Now we want to build the NN and his architecture. This configuration should be a manual process, something based on experiences and it should be noted that the results below are for one specific model and dataset. The ideal hyper-parameters for other models and datasets will differ.:

The learning rate are the steps between and is a hyper-parameter that controls how much we are adjusting the weights of our network with respect the loss gradient as a difference or delta parameter. The lower the value, the slower we travel along the downward slope. Learning rate is applied every time the weights are updated via the learning rule; thus, if learning rate changes during training (which we don’t), the network’s evolutionary path toward its final form will immediately be altered. Connection rate of 1 means simply fully connected between neurons. Then we train the model based on our logic table with 5000 epochs:

As you can see the XOR is one of the 8 above logic functions (for sake of overview we cant see all 16 cases). The first thing we’ll explore is how learning rate affects model training. In each run the same model is trained from scratch, varying only the optimizer and learning rate. We test that with the MSE, Mse:=NN.Train(inputs,outpts);

In Statistics, Mean Square Error (MSE) is defined as Mean or Average of the square of the difference between actual and estimated values. In each run, the network is trained until it achieves at least 96% train accuracy with a low MSE.

This configuration now is not a manual process, but rather something that occurs automatically through a process known as backward propagation. Backward propagation considers the error of the prediction made by forward propagation and parses those values backwards through the network, adjusting the weights to values that slightly improve prediction accuracy.

https://www.mlopt.com/?tag=xor

Forward and backward propagation are repeated for each training example in the dataset many times until the weights of the network are tuned to values that result in forward propagation producing accurate output in the following test routine (run means test routine).

So the score is a simple accuracy based on a threshold with a const: PERF_GATE = 0.85; Any value of 0.85 or higher is deemed to predict 1, while anything lower than 0.85 is deemed to predict 0. This is compared to the actual expected output and the proportion of correct predictions (prediction accuracy) is returned to the console. The first three parameters of the Multilayer Perceptron constructor define the dimensions of the network. In this case we have defined two input units, three hidden units and one output unit, as is required for this architecture.

Build NN with 3 layers:
NN 0: with 2
NN 1: with 3
NN 2: with 1

Also possible is to to call direct the functions via DLL for example:

The script you can found at:

http://www.softwareschule.ch/examples/fanndemo.txt

As the earlier results show, it’s crucial for model training to have an good choice of optimizer and learning rate. Manually choosing these hyper-parameters is time-consuming and error-prone.

More about FANN: http://leenissen.dk/fann/wp/

Originally published at http://maxbox4.wordpress.com on February 16, 2021.

Max Kleiner's professional environment is in the areas of OOP, UML and coding - among other things as a trainer, developer and consultant.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store