import math
import numpy as np
import torch
How to separate epistemic and aleatoric uncertaity of Dirichlet distirbution
Also studing the implications of it and proposing the applications of the solutions.
- Formula for [1] .
- theory in [2]
ToDo : complete the section with info
[1] Separation of Aleatoric and Epistemic Uncertainty in Deterministic Deep Neural Networks Denis Huseljic, Bernhard Sick, Marek Herde, Daniel Kottke
[2] Deep Deterministic Uncertainty: A Simple Baseline Jishnu Mukhoti
= 1
prior = 5
n_classes def predict_epistemic( alpha):
"""Predicts the uncertainty of a sample. (K / alpha_0)"""
return n_classes * prior / alpha.sum(-1, keepdim=True)
def predict_aleatoric( alpha):
"""Predicts the uncertainty of a sample. (K / alpha_0)"""
= (alpha / alpha.sum(-1, keepdim=True)).clamp_(1e-8, 1-1e-8)
proba_in = - torch.sum((proba_in * proba_in.log()), dim=-1)
entropy = entropy / np.log(n_classes)
normalized_entropy return normalized_entropy
= torch.ones(n_classes)
ones print (predict_epistemic(ones), predict_aleatoric(ones))
tensor([1.]) tensor(1.)
When alpha of only a single class keeps increasing
Observation : Both uncertainty reduces
Impact : When the model puts all confidence(alpha) on a single class it shows that the model is confident about the class and uncertainty reduces.
The maximum aleatoric and epistemic uncertitny is both 1.0
Epistemic is always lower than Aleatoric
for i in [1, 10, 50, 1000 ]:
= torch.ones(n_classes)
x 0] = i
x[print (x)
print ("Epistemic UE : {}, Aleatoric UE : {}".format(predict_epistemic(x), predict_aleatoric(x)))
print ("------------",predict_epistemic(x) > predict_aleatoric(x))
tensor([1., 1., 1., 1., 1.])
Epistemic UE : tensor([1.]), Aleatoric UE : 1.0
------------ tensor([False])
tensor([10., 1., 1., 1., 1.])
Epistemic UE : tensor([0.3571]), Aleatoric UE : 0.6178266406059265
------------ tensor([False])
tensor([50., 1., 1., 1., 1.])
Epistemic UE : tensor([0.0926]), Aleatoric UE : 0.2278686910867691
------------ tensor([False])
tensor([1000., 1., 1., 1., 1.])
Epistemic UE : tensor([0.0050]), Aleatoric UE : 0.019580082967877388
------------ tensor([False])
When alpha of multiple classes keeps increasing
- Observation : Epistemic reduces aleatoric is high
- Impact : When the model puts all confidence(alpha) on multiple classes basically suggests that the model is not confident. While since some alpha has increased it suggests that the input is an observed data(not new) and therefore low aleatoric uncertainty
The maximum aleatoric and epistemic uncertainty is both 1
for i in [1, 10, 50, 10000 ]:
= torch.ones(n_classes)*i
x print (x)
print ("Epistemic UE : {}, Aleatoric UE : {}".format(predict_epistemic(x), predict_aleatoric(x)))
print ("------------",)
tensor([1., 1., 1., 1., 1.])
Epistemic UE : tensor([1.]), Aleatoric UE : 1.0
------------
tensor([10., 10., 10., 10., 10.])
Epistemic UE : tensor([0.1000]), Aleatoric UE : 1.0
------------
tensor([50., 50., 50., 50., 50.])
Epistemic UE : tensor([0.0200]), Aleatoric UE : 1.0
------------
tensor([10000., 10000., 10000., 10000., 10000.])
Epistemic UE : tensor([1.0000e-04]), Aleatoric UE : 1.0
------------
Impact of prior
prior = 50
The highest epistmeic uncertainty increases from 1 to the prior value
Conclusions
- Dirichlet distirbution can be dis-entagled into aleatoric and epistemic uncertainty.
- When all alpha is 1 - both uncertainty are also 1 impling that the network doesnt know anything
- If only one output class alpha is higher then both uncertainty is low
- The higher the alpha the lower both the uncertainty
- If multiple alpha is higher then only aleatoric is high epistemic stays low. Impling that since the some alpha was increased the network has seen the input and its not sure which amongst the outputs is correct.
Use Case
1. For identifying OOD data
- For the training dataset measure the epistemic uncertainty of the correct predictions. It should be less than 1 and near to zero
- During prediction if epistemic uncertainty is higher than the training max then that data should be considered OOD and handled appropriately
2. For handling in-domain uncertain data
- If the epistemic unertainty is is range but if the aleatoric is high we can use these in embodied situation to collect additional data(image) from different view, fuse and make decision. Example if blur image - then differ to predict but dont flag as OOD, maybe in next image the information will be clear.