How can feature attribution methods be evaluated for fidelity?

Feature attribution fidelity measures how accurately an explanation method reflects the true influence of input features on a model’s output. Evaluating fidelity requires both theoretical and empirical checks so explanations are faithful to the model rather than merely plausible to humans. Research on attribution highlights two complementary paths: axiomatic guarantees and empirical perturbation tests.

Axioms and theoretical criteria

Axiomatic approaches provide formal desiderata that attributions should satisfy. Integrated Gradients was proposed by Mukund Sundararajan Google Research and Ankur Taly Google Research to satisfy properties such as sensitivity and completeness, offering a theoretical baseline for faithfulness. SHAP methods developed by Scott Lundberg University of Washington and Su-In Lee University of Washington connect attributions to Shapley values, supplying guarantees like consistency that tie the attribution to cooperative game-theoretic influence. Axioms do not prove practical fidelity on complex data, but they offer necessary conditions that help rule out pathological explanations.

Perturbation, ground truth and robustness tests

Empirical tests measure whether changing or removing purportedly important features changes model outputs as expected. Perturbation tests such as deletion/insertion quantify output degradation when top-ranked features are removed. Synthetic or labeled-ground-truth datasets allow direct comparison between algorithmic attributions and known causal features, giving an objective fidelity metric. Model-randomization or parameter-shuffling tests check that explanations change when the model changes; if they do not, the explanation is likely unfaithful.

Beyond these, practical evaluation examines stability across similar inputs, computational cost of repeated perturbations, and human-grounded studies where domain experts judge whether high-fidelity explanations align with domain knowledge. Empirical metrics should be reported alongside axiomatic properties to satisfy both theoretical and applied audiences.

Consequences and context matter: high-fidelity attributions improve debugging, regulatory compliance, and user trust, but they can also expose sensitive model behavior or increase compute and energy costs during evaluation. In safety-critical domains and culturally diverse settings, fidelity must be balanced with clarity so explanations are both accurate and meaningful to stakeholders across territories. Combining axiomatic tests by method authors with rigorous perturbation and ground-truth experiments yields the most reliable assessment of attribution fidelity.