Dis-entaglement of Epistemic and Aleatoric uncertainty for Dirichlet Distribution

How to separate epistemic and aleatoric uncertaity of Dirichlet distirbution

Also studing the implications of it and proposing the applications of the solutions.

Formula for [1] .
theory in [2]

ToDo : complete the section with info

[1] Separation of Aleatoric and Epistemic Uncertainty in Deterministic Deep Neural Networks Denis Huseljic, Bernhard Sick, Marek Herde, Daniel Kottke

[2] Deep Deterministic Uncertainty: A Simple Baseline Jishnu Mukhoti

import math 
import numpy as np
import torch

prior = 1
n_classes = 5
def predict_epistemic( alpha):
    """Predicts the uncertainty of a sample. (K / alpha_0)"""
    return n_classes * prior / alpha.sum(-1, keepdim=True)

def predict_aleatoric( alpha):
    """Predicts the uncertainty of a sample. (K / alpha_0)"""
   
    proba_in = (alpha / alpha.sum(-1, keepdim=True)).clamp_(1e-8, 1-1e-8)
    entropy = - torch.sum((proba_in * proba_in.log()), dim=-1)
    normalized_entropy = entropy / np.log(n_classes)
    return normalized_entropy

ones = torch.ones(n_classes)
print (predict_epistemic(ones), predict_aleatoric(ones))

tensor([1.]) tensor(1.)

When alpha of only a single class keeps increasing

Observation : Both uncertainty reduces
Impact : When the model puts all confidence(alpha) on a single class it shows that the model is confident about the class and uncertainty reduces.
The maximum aleatoric and epistemic uncertitny is both 1.0
Epistemic is always lower than Aleatoric

for i in [1, 10, 50, 1000 ]:
  x = torch.ones(n_classes)
  x[0] = i
  print (x)
  print ("Epistemic UE : {}, Aleatoric UE : {}".format(predict_epistemic(x),  predict_aleatoric(x)))
  print ("------------",predict_epistemic(x) > predict_aleatoric(x))

tensor([1., 1., 1., 1., 1.])
Epistemic UE : tensor([1.]), Aleatoric UE : 1.0
------------ tensor([False])
tensor([10.,  1.,  1.,  1.,  1.])
Epistemic UE : tensor([0.3571]), Aleatoric UE : 0.6178266406059265
------------ tensor([False])
tensor([50.,  1.,  1.,  1.,  1.])
Epistemic UE : tensor([0.0926]), Aleatoric UE : 0.2278686910867691
------------ tensor([False])
tensor([1000.,    1.,    1.,    1.,    1.])
Epistemic UE : tensor([0.0050]), Aleatoric UE : 0.019580082967877388
------------ tensor([False])

When alpha of multiple classes keeps increasing

Observation : Epistemic reduces aleatoric is high
Impact : When the model puts all confidence(alpha) on multiple classes basically suggests that the model is not confident. While since some alpha has increased it suggests that the input is an observed data(not new) and therefore low aleatoric uncertainty

The maximum aleatoric and epistemic uncertainty is both 1

for i in [1, 10, 50, 10000 ]:
  x = torch.ones(n_classes)*i
  print (x)
  print ("Epistemic UE : {}, Aleatoric UE : {}".format(predict_epistemic(x),  predict_aleatoric(x)))
  print ("------------",)

tensor([1., 1., 1., 1., 1.])
Epistemic UE : tensor([1.]), Aleatoric UE : 1.0
------------
tensor([10., 10., 10., 10., 10.])
Epistemic UE : tensor([0.1000]), Aleatoric UE : 1.0
------------
tensor([50., 50., 50., 50., 50.])
Epistemic UE : tensor([0.0200]), Aleatoric UE : 1.0
------------
tensor([10000., 10000., 10000., 10000., 10000.])
Epistemic UE : tensor([1.0000e-04]), Aleatoric UE : 1.0
------------

Impact of prior

prior = 50

The highest epistmeic uncertainty increases from 1 to the prior value

Conclusions

Dirichlet distirbution can be dis-entagled into aleatoric and epistemic uncertainty.
When all alpha is 1 - both uncertainty are also 1 impling that the network doesnt know anything
If only one output class alpha is higher then both uncertainty is low
The higher the alpha the lower both the uncertainty
If multiple alpha is higher then only aleatoric is high epistemic stays low. Impling that since the some alpha was increased the network has seen the input and its not sure which amongst the outputs is correct.

Use Case

1. For identifying OOD data

For the training dataset measure the epistemic uncertainty of the correct predictions. It should be less than 1 and near to zero
During prediction if epistemic uncertainty is higher than the training max then that data should be considered OOD and handled appropriately

2. For handling in-domain uncertain data

If the epistemic unertainty is is range but if the aleatoric is high we can use these in embodied situation to collect additional data(image) from different view, fuse and make decision. Example if blur image - then differ to predict but dont flag as OOD, maybe in next image the information will be clear.