Machine learning algorithms are about finding correlation in data. In contrast, Interpretable machine learning is about understanding causality.
Applying domain knowledge requires understanding of the model and how to interpret its output.
Often, we need to understand individual predictions a model is making.
For instance, a model might:
We need to understand differences on a datasets level:
Model debugging. We might want to understand why a model that worked previously are not working when applied to new data.
Prediction-level interpretability
Explain why $x$ is predicted as $y$ by $f(x)$
Model-level interpretability
What does a pattern belonging to our model look like?
Data-level interpretability
What is the most important dimension of our data?