Most technical work on interpretability focuses on shrinking the black box: saliency maps, SHAP values, counterfactuals, rule extraction, and local surrogate models. These are useful tools I've applied several of them across projects listed on my Publications page but they don't always deliver the understanding clinicians actually need.

That gap is not just a usability problem; it's conceptual. Two central questions matter: What counts as an explanation? and Who is the explanation for? Answering them changes how we design, evaluate, and deploy XAI.

Technical Explanations vs. Human Understanding

Engineering-oriented explanations typically quantify feature contributions or visualize focus areas. Those outputs are great for debugging and auditing models. Yet a clinician confronted with "Feature A contributed 0.23" needs a different kind of reasoning: a justification that supports an action for a specific patient.

In short: technical explanations reveal correlations; philosophical explanations provide justifications that support decisions.

Philosophy gives useful distinctions

Some philosophical ideas I find practically helpful:

Design implications for XAI

Philosophical clarity leads to different engineering priorities:

  1. Audience-specific explanations: produce actionable, high-level rationales for clinicians; produce technical fidelity reports for model auditors; produce concise, comprehensible narratives for patients.
  2. Actionability & calibration: evaluate whether explanations help users make better decisions or appropriately calibrate trust. This requires human-in-the-loop studies, not just proxy metrics.
  3. Privacy-aware interpretability: in privacy-preserving systems (like MedHE), standard XAI probes on raw gradients are unavailable. Design explanations that operate at encrypted or aggregated abstraction levels cohort-level summaries, uncertainty bands, and example-based explanations that do not leak sensitive data.

Concrete links to my work

Privacy-preserving federated learning projects such as MedHE demonstrate that we can retain high performance while preserving privacy but they also highlight interpretability gaps that arise when gradients are encrypted or sparsified. Complementary projects in my publications explore fairness-aware representation learning and interpretable hybrid frameworks.

"A usable explanation for a clinician need not reveal training data; it must reveal the model's limits, typical failure modes, and the sources of uncertainty relevant to the current patient."

From metrics to meaning: evaluation suggestions

Beyond fidelity and stability, we should evaluate explanations by:

Where I'm heading

My next experiments will focus on cohort-level explanation modules compatible with privacy-preserving pipelines, clinician-in-the-loop calibration studies, and integrating normative frameworks from epistemology to inform explanation content for sensitive domains. If you're interested in collaborating on human evaluation studies or privacy-aware interpretability, I'd love to connect.