L4: Counterfactual Explanations

However, under certain circumstances practitioners are interested in how to transform a predicted “negative” to a “positive” instance.

Gain actionable insights on how to change an undesired outcome.

Counterfactual reasoning

Why actionability?

Given a model for predicting whether or not a patient is likely to develop a disease: understand what actions should be taken to transform a negative outcome prediction to a positive outcome.

For example, from the undersired outcome patient is likely to develop disease X to desired outcome patient is unlikely to develop X.

Also, what changes should I perform in order to avoid the undesired outcome!

“Right to explanation”

“… to express his or her point of view, to obstain an explanation of the decision reached after such assessment and to challenge the decision.”

GDPR, EU.

“The statement of reasons for adverse action must be specific and indicate the principal reason(s) for the adverse action.”

Equal Credit Opportunity Act (US)

Definition

“A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output.”

Christoph Molnar (Intepretable Machine Learning).

Problem formulation

Given a $n$-dimensional vector space $X \in \R^n$ with instance $x \in X$ labeled as either $\{+, -\}\in Y$.

Given an unknown target function $f:X\rarr Y$ and a already trained approximation $\hat f$, where $\hat f\sim f$ (i.e., a good predictor)

The task is to transform the original instance $x$ into a new instance $x'$ (i.e., $x\rarr x'$), such that $\hat f(x) = -$ and $\hat f(x')=+$.

The goal of the counterfactual generation is to (efficiently) identify a transformation that converts a negatively predicted instance into positively predicted instance.

Typically the objective is to choose the trasformed instance such that a cost function $\delta$, is minimized: