# Bayes Theorem & Confusion Matrix

Bayes’ theorem is crucial for interpreting the results from binary classification algorithms.

We will show that Bayes’ theorem is simply the relationship between precision and recall (precisely it is the **precision**), then we can turn the process into an equation, which is Bayes’ Theorem. It lets you take the test results and correct for the “skew” introduced by false positives. Let me show that with some real numbers of a confusion matrix.

print(confusion_matrix(y_test, y_pred))

print(classification_report(y_test, y_pred))

dtype: int64

[[ 39 32] 71 = A = 0

[ 22 222]] 244 =B = 1

254 315

`___________Predict`

Actual 0 [[ 39 32] 71

Actual 1 [ 22 222]]244

___________61254 315

1. precision: 222/254 =0.874 P(A/B)

2. recall: 222/244 =0.909 = Bayes P(B/A)

So the recall (sensitivity) results in Bayes: P(B/A)

The probability P(B/A) = 222/244 = 0.91 is called the recall. It simply gives the percentage of the 244 actual B that were classified by our classification algorithm. We can see (maybe not at first glance) that Bayes’ theorem is simply a relationship between recall and precision:

[Predict * precision / Actual = recall] = Bayes

(P_B*P_AB)/P_A = 0.9098360655737705

(P(B)*P(A/B))/(P(A)) = 0.9098360655737705 = P(B/A)

(254/315 * 222/254) / (244/315) = 0.9098

predict * precision / actual = recall

And **what is the question** again? Oh really yes: what’s the chance we really have an illness or disease if we get a positive result in our case. This is a bit like the dog of two head (recall or precision). The answer is **87 %** (this precision as specificity):

`Bayes Theorem`

r1 Actual_probability = 77% = 0.77

r2 Prob_true_positive = 90% = 0.9

r3 Prob_false_positive = 45% = 0.45

Chance positive test means positive result

>>>$(r1 * r2)/((r1 * r2) + r3*(1-r1)) * 100 = 87.01 %

---

https://instacalc.com/52323

r1 is actual, r2 is recall, r3 is 32/71 = 45% as Prob_false_positive

I transform the formula above to predict:

predict = actual * recall / precision = ~80% = 77 * 90 / 87

or precision = actual * recall / predict = **87%**

Lets compare with the Bayes formula

**prior * likelihood / evidence = posterior **actual * recall / predict = precision

prior -> actual; likelihood -> recall; evidence -> predict

`precision recall f1-score support`

0 0.64 0.55 0.59 71

1 **0.87 0.91** 0.89 244

`accuracy 0.83 315`

`macro avg 0.76 0.73 0.74 315`

`weighted avg 0.82 0.83 0.82 315`

Now the **proof of concept** with a decision tree; you may be able to immediately see that to answer this question with a simple ratio: number of diseased people with symptoms / total number of people with symptoms (includes false positive). What is the probability a patient with symptoms actually has the disease?

Number of people with disease and symptoms (222) / total number with symptoms (222 + 32)

which gives us: 222 / 254 = 0.87 = 87.4%.

Now let’s construct the same answer with a probability tree:

Which is **exactly the same answer you would get by actually working the formula 87%. **I’ve never come across a Bayes’ related problem that can’t be answered with a probability (decision) tree!

Bayes matrix detector compute ends…, Confusion matrix needs both labels & predictions as single-digits, not as one-hot encoded vectors; although you have done this with your predictions using model.predict_classes()

<iframe src=”https://instacalc.com/54440/embed" width=”450" height=”350" frameborder=”0"></iframe>

You can explore that more with a forked jupyter notebook in python: https://github.com/maxkleiner/Bayes_theorem/blob/master/Bayes_Theorem.ipynb