So in my recent readings of various ML media, from blog posts to published papers, I’ve just started to notice a trend. I’m growing more certain that the ML community is ignoring uncertainty, which is certainly not a good practice and it renders much their results quite uncertain.
In this post, I wanted to just go over a quick and easy method to use inverse probability to estimate the uncertainty in your model’s test-accuracy. Generally, after you test your model, you have an accuracy percentage, as well as the number of true predictions, false predictions, and total predictions. For simplicity, we are going to ignore nuances such as true/false positives and negatives.
From this data, we want to find , i.e. the probability of your accuracy given the number of correct predictions and the number of wrong predictions. This is a classic Bayesian Inference problem: given a result, what is the distribution most likely to create the result. Using Bayes Theorem we find:
Since is simply a normalizing constant, we are going pull it, and all future constants out as , s.t.
However, is simply a binomial distribution s.t.
Pulling out the into , we now have:
Now, we have this prior term of that allows us to express our initial belief as to what the accuracy is. If we want to assume very little about the accuracy in our prior term, we can set the prior to a Beta distribution, s.t.
The reason for choosing the beta distribution in this instance is because the beta distribution is a conjugate prior to the binomial distribution, which essentially means that the prior beta distribution can be updated with the binomial distribution and remain in the same distribution form, which allows for extremely easy computations. If we set and , we get
We now have a probability distribution that represents our probability of various accuracy values given the number of correct and incorrect predictions. All you have to do is plug in the number of correct predictions, and the number of wrong predictions into a beta distributions and you will then have a sense of the range of accuracies your model could reasonably contain.
This was a shorter post, but I hope it gave you a good idea of how to put your model’s test set performance into perspective relative to its true accuracy. Until next time!