Multiclass Liver Cirrhosis Segmentation and Severity Staging on T2-Weighted MRI Using 3UResNet Explainability and Per-Class Deep Expert Classifiers

Deblina Mazumder Setu¹, Samrat Kumar Dey², Tania Islam¹, Arpita Howlader³, Saurov Chandra Biswas⁴, Rashed Mazumder⁵, Umme Raihan Siddiqi⁶, Md. Mahbubur Rahman⁷

¹University of Barishal, ²Bangladesh Open University, ³Patuakhali Science and Technology University, ⁴AppsCode Inc., ⁵Jahangirnagar University, ⁶Shaheed Suhrawardy Medical College and Hospital, ⁷Military Institute of Science and Technology

Paper arXiv Code Data

Class A (Mild)

Class B (Moderate)

Class C (Severe)

Liver Cirrhosis Classification: Visualizing the Progression Across 3 Critical Severity Levels.

Stepwise procedure of (A) Segmentation and (B) Classification of liver cirrhosis based on chirrMRI600+ dataset.

Abstract

Liver cirrhosis demands accurate, interpretable MRI analysis to support timely diagnosis and staging, yet existing methods often decouple segmentation from classification and lack the explainability required for clinical use. This work introduces 3UResNet, an ensemble of U-Net, TransUNet++, and Attention U-Net with ResNet50 as encoder, designed for robust multiclass cirrhosis segmentation on 2D T2-weighted MRI from the CirrMRI600+ dataset. A total of 5,364 training, 674 validation, and 664 test samples across mild, moderate, and severe classes were preprocessed with augmentation, normalization, and mask binarization. This research also designed three Vision Transformer–based models, namely SwinUnet, CCTUnet, and EaNetUnet for segmentation. On the test set, 3UResNet achieved a Dice score of 0.9516 and a mean Intersection over Union of 0.9077 across all classes, outperforming the transformer baselines. After accurate segmentation, this study used the segmented masks to extract shape features and applied convolutional neural networks (CNN) to extract visual features for classification. DenseNet121 performed best for mild cirrhosis (95%), while CoAnNet achieved higher accuracy for moderate (80%) and severe cirrhosis (94%). Here, a WeightedRandom sampler and focal loss were employed to mitigate class imbalance during training, ensuring results reflect real-world clinical distributions. Model interpretability is strengthened with LIME and Grad-CAM, highlighting clinically meaningful liver regions and aligning attention with cirrhotic anatomy across severity levels. This study demonstrates a practical end-to-end framework uniting high-fidelity segmentation with severity staging and transparent rationale, paving the way for clinical adoption.

Proposed Framework: 3UResNet Pipeline

🔴 The Problem

Existing methods often decouple segmentation from classification, leading to loss of diagnostic context.
Current AI tools lack the explainability required for clinical trust.
Class imbalance in medical datasets hinders accurate staging of severity.

🟡 Why We Solved It

To enable automated, timely diagnosis and precise staging of Liver Cirrhosis on T2-Weighted MRI.
To outperform single-model Baselines (e.g., SwinUnet) using a robust ensemble approach.
To bridge the gap between pixel-level segmentation and patient-level diagnosis.

🟢 How We Solved It (3UResNet)

Proposed Model: 3UResNet, an ensemble of U-Net, TransUNet++, and Attention U-Net with a ResNet50 encoder.
Pipeline: First, precise segmentation (Dice: 0.9516), followed by extraction of Shape & Visual features.
Classification: Utilized DenseNet121 (for Mild) and CoAnNet (for Moderate/Severe) with WeightedRandom Sampler to mitigate imbalance.

Figure: The proposed 3UResNet Ensemble Architecture and Classification Pipeline.

Performance Evaluation

Comprehensive analysis of the proposed 3UResNet model demonstrating superior accuracy and clinical relevance.

(a) Qualitative Segmentation

Comparison of the ground truth mask and predicted mask along with error analysis for the 3UResNet model.

Why it's better: Our ensemble approach preserves local anatomical details often missed by Transformers.

(b) Quantitative Comparison

Qualitative comparison of liver tumor segmentation across multiple models showing prediction variations.

Why it matters: Highlights architectural differences in disease-region confidence.

(c) LIME Explainability

LIME explainability identifies important regions for mild, moderate, and severe cirrhosis classification.

Why it matters: Confirms the model focuses on clinically meaningful structures.

(d) Grad-CAM Analysis

Grad-CAM visualizations indicating regions critical for cirrhosis stage prediction.

Why it's better: Provides transparent, clinically interpretable validation.

Quantitative Results: Segmentation

Performance comparison of 3UResNet against Transformer-based and CNN-based baselines using Dice Score and Mean IoU.

Table 1: Segmentation Performance of Integrated Transformer & ViT Models

Model Architecture		Mild		Moderate		Severe
Encoder	Decoder	Dice	mIoU	Dice	mIoU	Dice	mIoU
ResNet50	TransUNetPP (3UResNet)	0.9532	0.9106	0.9578	0.9191	0.9310	0.8710
ResNet50	Attention U-Net	0.9605	0.9239	0.9572	0.9180	0.9326	0.8738
ResNet50	UNet	0.9556	0.9150	0.9534	0.9109	0.9330	0.8745
ResNet50	LinTransUnet	0.9416	0.8896	0.9559	0.9156	0.9282	0.8661
VGG16	U²-Net	0.9315	0.8718	0.9411	0.8888	0.8563	0.7487
InceptionV3	SynergyNet	0.9575	0.9184	0.9546	0.9131	0.9339	0.8759
Swin-Unet		0.9319	0.8725	0.9409	0.8884	0.9173	0.8472
CCT-Unet		0.9321	0.8728	0.9347	0.8774	0.9281	0.8658
EaNet		0.9287	0.8669	0.9390	0.8850	0.9242	0.8592

Classification Performance & Analysis

Evaluation of the proposed Hybrid Fusion Models and comparison with state-of-the-art studies.

Table 2: Classification Performance (SubsetRandomSampler)

Model	Mild			Moderate			Severe
Feature Extractor	Prec.	Rec.	Acc.	Prec.	Rec.	Acc.	Prec.	Rec.	Acc.
DenseNet-121	0.88	0.65	0.65	0.23	0.36	0.36	0.56	0.60	0.60
ResNet-101	0.83	0.71	0.71	0.23	0.38	0.38	0.51	0.40	0.40
EfficientNet-b0	0.84	0.71	0.71	0.23	0.31	0.31	0.55	0.59	0.59
ResNet34	0.90	0.71	0.71	0.25	0.42	0.42	0.52	0.48	0.48
ResNet50+GAP	0.75	0.77	0.77	0.27	0.08	0.08	0.48	0.71	0.71

Table 3: Classification Performance (WeightedRandomSampler)

Model	Mild			Moderate			Severe
Feature Extractor	Prec.	Rec.	Acc.	Prec.	Rec.	Acc.	Prec.	Rec.	Acc.
Hybrid7Net6 (Proposed)	0.97	0.77	0.77	0.56	0.77	0.77	0.67	0.75	0.75
DenseNet-121	0.83	0.95	0.95	0.13	0.05	0.05	0.64	0.77	0.77
ConvNeXt	0.82	0.95	0.95	0.20	0.11	0.11	0.63	0.66	0.63
CoAtNet	0.78	0.86	0.86	0.21	0.14	0.14	0.51	0.54	0.54
Hybrid10Net	0.84	0.91	0.91	0.79	0.26	0.26	0.58	0.77	0.77

Table 4: Training Configurations

Configuration	DenseNet121 (Mild)	CoAtNet (Moderate)	CoAtNet (Severe)
Data Augmentation	Resize, HFlip 0.5, ShiftScaleRotate 0.05, BrightnessContrast, GaussNoise	Same base + stronger minority aug (HFlip 0.8, VFlip 0.5, more noise)	Same as Moderate with focus on Severe minority
Loss Function	FocalLoss (alpha=None, gamma=2.0/3.0)	FocalLoss with class weights (Moderate weighted 1.5)	FocalLoss with class weights (no multiplier)
Sampler	WeightedRandomSampler oversampling rare classes	WeightedRandomSampler with inverse frequency + class weights	Same as Moderate
Optimizer	AdamW + ReduceLROnPlateau	Same as DenseNet121	Same as DenseNet121

Table 5: Comparison of Studies on Liver Chirrhosis

Study	Dataset	Segmentation Method	Classification Method	Performance	Explainability
Jha et al. [2025]	T2W 2D	MedSegDiff	N/A	Dice: 0.76	Not discussed
Gupta et al. [2025]	T2W 2D	2D nnU-Net	Masked Attn ResNet-18	Dice: 0.96, Acc: 78%	Grad-CAM
This Study	T2W 2D	3UResNet50	Hybrid7Net6	Dice: ~0.95 \| Acc: Mild 0.95, Mod 0.80, Sev 0.94	LIME + Grad-CAM

Key Findings

Summary of the major breakthroughs achieved in this study.

🎯

Superior Segmentation

The proposed 3UResNet model achieved a Dice Score of ~0.95, significantly outperforming standard ViT and CNN baselines by preserving local anatomical details.

⚖️

Robust Classification

Our Hybrid7Net6 framework successfully handled class imbalance, achieving high accuracy (Mild: 95%, Severe: 94%) using WeightedRandomSampler.

🧠

Explainable AI (XAI)

Integration of LIME and Grad-CAM provided visual validation, confirming that the model focuses on relevant liver regions (e.g., fragmentation) for decision making.

🏥

Clinical Applicability

The end-to-end pipeline demonstrates that automated systems can reliably grade cirrhosis severity from 2D MRI slices without invasive biopsy.

Future Directions

Prospective avenues to further enhance the system's capability and deployment.

➤ 3D Volume Analysis: Extending the current 2D slice-based approach to 3D volumetric analysis to capture the complete spatial context of liver fibrosis.
➤ Multi-Modal Integration: Incorporating clinical data (blood biomarkers, patient history) alongside MRI images to improve diagnostic precision.
➤ Federated Learning: Implementing privacy-preserving federated learning to train models across multiple hospitals without sharing sensitive patient data.
➤ Real-Time Deployment: Developing a lightweight version of the model for mobile applications or real-time clinical support systems.

Honors & Presentations

Our research has been recognized and presented at prestigious academic forums.

🏆

GPC Research & Creative Activities Forum

University of Missouri (Fall 2025)

🥈 2nd Place
Research Poster Competition

🏆 People’s Choice Award
Most Recognized Research Poster

🏛️

Stanford AI+Health Conference

Stanford University (2025)

📢 Featured Presentation
Selected for poster presentation among top research works in AI & Healthcare.

*Presented as a research poster to the global AI health community.

BibTeX

@article{setu2025explainable,
  author    = {Setu, Deblina Mazumder and Dey, Samrat Kumar and Islam, Tania and Howlader, Arpita and Biswas, Saurov Chandra and Mazumder, Rashed and Siddiqi, Umme Raihan and Rahman, Md. Mahbubur},
  title     = {Multiclass Liver Cirrhosis Segmentation and Severity Staging on T2-Weighted MRI Using 3UResNet Explainability and Per-Class Deep Expert Classifiers},
  journal   = {},
  year      = {},
}