Produce Explainability Artifacts as First-Class Training Outputs
Intent
Motivation
Applicability
Description
Choosing an interpretable model (see employ interpretable models) addresses explainability at the architecture level. But even when interpretable models are not feasible, or when a black-box model is warranted by performance requirements, explanation artifacts can be systematically generated and versioned alongside model weights and metrics.
The goal is explainability-by-design: treating explanation outputs as a required, versioned artifact of the training pipeline rather than an optional inspection step.
Generate Standard Explanation Artifacts at Training Time
After each training run, produce and store the following as versioned outputs alongside the model:
- Global feature importance plots: summary charts (e.g. SHAP summary plots, permutation importance) showing which features drive predictions across the dataset,
- Dependence plots: visualizations of how model output changes as a function of individual features, revealing non-linearities and interactions,
- Per-segment analysis: feature importance breakdowns for key population subgroups, to surface differential behavior and support fairness assessment,
- Example-based explanations: representative correctly and incorrectly predicted instances with their local explanations (e.g. LIME or SHAP waterfall plots).
Store these alongside the model artifact so that every deployed version has an associated explanation snapshot.
Integrate Explanation Generation into the Training Pipeline
Explanation generation should be an automated step in the training pipeline, not a manual inspection task. Treat a training run that produces no explanation artifacts as incomplete, the same way a run that produces no evaluation metrics would be.
Use Explanations for Debugging and Review
Explanation artifacts are not only for external stakeholders; they are a primary debugging tool. When model quality drops, compare explanation artifacts across versions to identify which features changed in influence. Peer review of models (see peer review) should include review of explanation artifacts, not just performance metrics.
Version and Link Explanations to Model Releases
Explanation artifacts should be versioned together with the model and linked in the model registry (see versioning). This ensures that when a model is rolled back or audited, the corresponding explanation snapshot is available without reconstruction.
Related
- Employ Interpretable Models When Possible
- Explain Results and Decisions to Users
- Use Versioning for Data, Model, Configurations and Training Scripts
- Continuously Measure Model Quality and Performance
- Enforce Fairness and Privacy
Read more
- Ethics Guidelines for Trustworthy AI
- A Methodology and Software Architecture to Support Explainability-by-Design