PhD defence of Amar Kumar – Explainability and Fairness in Medical Imaging via Counterfactual Generation
Abstract
Ensuring the trustworthiness of deep learning in medical imaging requires going beyond accuracy to address explainability and fairness. In safety-critical domains such as healthcare, clinical adoption depends not only on predictive performance but also on a model’s ability to provide transparent reasoning. This thesis introduces generative modeling approaches, including Vision-Language Foundation Models (VLFMs), to synthesize counterfactual (CF) images, hypothetical alternatives that simulate the effect of modifying specific clinical attributes thereby enabling both interpretability and bias mitigation.
We first address the gap that predictive models are not designed to discover personalized imaging markers predictive of future outcomes. To this end, we develop the first deep conditional 3D generative model, based on GANs, for subject-specific counterfactual generation in brain MRI of patients with relapsing-remitting multiple sclerosis. By analyzing differences between factual and CF images, local modifications predictive of future disease activity can be identified, enabling interpretable biomarker discovery. Validation on a large, multi-center dataset shows alignment with established markers of progression. While this model provided the first subject-specific counterfactuals for volumetric imaging, it was still limited latching onto the features of the majority class, and thereby biased. This led to the second contribution focused on fairness.
Next, we address the issue that medical imaging classifiers often latch onto spurious correlations instead of disease markers, yielding biased models that are not ‘right for the right reasons.’ To overcome this, we propose the first end-to-end framework that integrates debiasing with counterfactual explanations using Cycle-GAN. By combining Empirical Risk Minimization (ERM) and Group-DRO with counterfactual generation, the framework disentangles disease features from shortcut signals and introduces the Spurious Correlation Latching Score (SCLS) to quantify model reliance on such biases. Experiments on CheXpert and RSNA datasets with both synthetic and real artifacts demonstrate improved generalization across underrepresented subgroups. Nonetheless, mitigation remained constrained by the limited generative capacity of Cycle-GAN, motivating the development of DeCoDEx—a diffusion-based counterfactual generator guided by an artifact detector that enables manipulation of disease markers while preserving clinically relevant features. However, this method still lacks the precision and resolution required for fine-grained clinical editing.
To overcome this limitation, we introduce PRISM, the first framework to adapt vision–language foundation models (VLFMs) to medical imaging for high-resolution, language-guided counterfactual synthesis. PRISM enables fine-grained editing of 2D medical images, such as the selective removal or insertion of devices (e.g., pacemakers, wires), while faithfully preserving anatomical structures. By leveraging natural language prompts, PRISM generates clinically meaningful counterfactuals that improve the fairness and robustness of downstream classifiers and provide data augmentation for underrepresented groups. Beyond targeted edits, PRISM also lays the groundwork for future research into uncovering hidden attribute–image relationships: we demonstrate that VLFM models can be queried with textual prompts to reveal associations between visual features and clinical attributes. Finally, we extend this research work by showing how reinforcement learning can be used to better align text-to-image generation with clinical relevance. We also demonstrate that VLFMs can disentangle key factors of variation, such as disease severity or the presence of artifacts, thereby enabling more controllable and interpretable medical image synthesis.