# uncertainty

## uncertainty_sampler(X, model, n, measure='least_confident')

Parameters:

Name Type Description Default
X

pool of IV conditions to evaluate uncertainty

required
model

Scikit-learn model, must have predict_proba method.

required
n

number of samples to select

required
measure

method to evaluate uncertainty. Options:

• 'least_confident': $$x* = \operatorname{argmax} \left( 1-P(\hat{y}|x) \right)$$, where $$\hat{y} = \operatorname{argmax} P(y_i|x)$$
• 'margin': $$x* = \operatorname{argmax} \left( P(\hat{y}_1|x) - P(\hat{y}_2|x) \right)$$, where $$\hat{y}_1$$ and $$\hat{y}_2$$ are the first and second most probable class labels under the model, respectively.
• 'entropy': $$x* = \operatorname{argmax} \left( - \sum P(y_i|x) \operatorname{log} P(y_i|x) \right)$$
'least_confident'
Source code in autora/experimentalist/sampler/uncertainty.py
  7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 def uncertainty_sampler(X, model, n, measure="least_confident"): """ Args: X: pool of IV conditions to evaluate uncertainty model: Scikit-learn model, must have predict_proba method. n: number of samples to select measure: method to evaluate uncertainty. Options: - 'least_confident': $x* = \\operatorname{argmax} \\left( 1-P(\\hat{y}|x) \\right)$, where $\\hat{y} = \\operatorname{argmax} P(y_i|x)$ - 'margin': $x* = \\operatorname{argmax} \\left( P(\\hat{y}_1|x) - P(\\hat{y}_2|x) \\right)$, where $\\hat{y}_1$ and $\\hat{y}_2$ are the first and second most probable class labels under the model, respectively. - 'entropy': $x* = \\operatorname{argmax} \\left( - \\sum P(y_i|x) \\operatorname{log} P(y_i|x) \\right)$ Returns: Sampled pool """ if isinstance(X, Iterable): X = np.array(list(X)) a_prob = model.predict_proba(X) if measure == "least_confident": # Calculate uncertainty of max probability class a_uncertainty = 1 - a_prob.max(axis=1) # Get index of largest uncertainties idx = np.flip(a_uncertainty.argsort()[-n:]) elif measure == "margin": # Sort values by row descending a_part = np.partition(-a_prob, 1, axis=1) # Calculate difference between 2 largest probabilities a_margin = -a_part[:, 0] + a_part[:, 1] # Determine index of smallest margins idx = a_margin.argsort()[:n] elif measure == "entropy": # Calculate entropy a_entropy = entropy(a_prob.T) # Get index of largest entropies idx = np.flip(a_entropy.argsort()[-n:]) else: raise ValueError( f"Unsupported uncertainty measure: '{measure}'\n" f"Only 'least_confident', 'margin', or 'entropy' is supported." ) return X[idx]