Privacy is non-negotiable in healthcare research and deployment. Techniques like federated learning, homomorphic encryption, and differential privacy allow institutions to collaborate without centralizing patient records. My own work — including the MedHE framework — shows that we can reach high accuracy while dramatically reducing the risk of data leakage.

Still, privacy is not free. There are subtle epistemic and social costs that deserve attention: the very mechanisms that protect individuals can also change how knowledge is produced, shared, and acted upon in clinical settings.

Privacy improves safety — but can hamper collective learning

Federated and encrypted training isolate local data. That isolation protects individuals, but it can also obstruct the formation of shared, granular insights that come from pooled analyses (e.g., rare adverse event patterns identifiable only when datasets are aggregated). How should we weigh individual privacy against collective medical knowledge?

Framing: privacy reduces leakage but may reduce the transparency and traceability that clinicians rely on when evaluating model behavior.

Three trade-offs I keep in mind

  1. Granularity vs. privacy: cohort-level summaries preserve privacy but can hide sub-group signals (rare conditions, intersectional effects). For public health surveillance, those signals matter.
  2. Explanations vs. leakage: richer, example-based explanations (showing similar training cases) help clinicians, but can risk membership inference unless carefully sanitized.
  3. Collaboration vs. control: privacy-preserving tech often shifts control toward system designers (who define aggregation and explanation policies). This centralization can affect whose knowledge counts and how it's used.

Design heuristics for balancing privacy and epistemic value

From my experience with MedHE and related projects, practical strategies include:

Applied example: MedHE and communication-efficient privacy

MedHE integrates adaptive gradient sparsification with CKKS homomorphic encryption to achieve significant communication savings and privacy guarantees. Technically, it performs well — but operationally, it highlights the interface challenge: clinicians cannot inspect local gradients or raw training instances, so explanation channels must adapt (e.g., cohort summaries, global failure-mode reports, and counterfactuals that operate at an aggregated level).

Policy and governance considerations

Technical design alone is not sufficient. Institutions should consider governance steps to retain epistemic value while protecting privacy:

Final thought

Privacy is essential. But implementing privacy-preserving AI requires conscious choices about what kinds of knowledge we preserve and what we let go. Those are value-laden design decisions. The challenge for research and practice is to build systems that protect patients while still enabling the collective epistemic processes that medicine relies on.

Related Work: For my publications on federated learning, privacy-preserving AI, and healthcare applications, see the Publications section of my website.