BraTS 2023 GLI · nnU-Net v2 · 956 training patients (fold 0) · 4 MRI modalities
Independent research project — no academic supervision
MRI intensity distributions in brain tumor imaging are heavily right-skewed (skewness +2.5 to +5.8 per-patient mean across modalities). The standard preprocessing in nnU-Net (per-patient z-score) normalizes mean and variance but does not correct the distribution shape. This skewness affects gradient stability, particularly for small tumor subregions (NCR, ET) where the intensity signal is dominated by the long tail of healthy tissue.
| Modality | Raw skew | After z-score | Physical basis |
|---|---|---|---|
| T1 | +2.48 | +2.48 | T1 relaxation — gray/white contrast |
| T1ce | +5.76 | +5.76 | Gadolinium enhancement — tumor vasculature |
| T2 | +3.67 | +3.67 | T2 relaxation — fluid/edema bright |
| FLAIR | +4.01 | +4.01 | Fluid suppression — perilesional edema |
Measured on 956 training patients (fold 0), per-patient mean skewness. Z-score does not change skewness (linear transform).
No study systematically compares intensity distribution transforms for brain tumor segmentation.
Reinhold et al. (2019) — Compares 7 normalization methods (z-score, WhiteStripe, Nyul, FCM, KDE, etc.) for MR synthesis. Does not include distribution transforms (Box-Cox, log, quantile).
Durso-Finley et al. (2024) — “Negligible effect of brain MRI preprocessing for tumor segmentation.” Tests skull-stripping, bias field, histogram matching, denoising. Concludes InstanceNorm compensates. Critical distinction: their transforms are linear or quasi-linear (scale, shift, resampling) — trivially compensated by InstanceNorm (mean/std normalization). Does not test nonlinear distribution transforms (Box-Cox, log, quantile) that modify skewness (3rd moment), which InstanceNorm does not correct.
Isensee et al. (2021) — nnU-Net uses per-case z-score for MRI. No skewness correction.
BraTS 2023/2024 winners — All use z-score via nnU-Net. No preprocessing innovation.
| ID | Method | Description | Parametric? | Addresses skew? |
|---|---|---|---|---|
| C1 | z-score | Per-patient (x - mu) / sigma | No | No |
| C2 | Box-Cox MLE + z-score | Power transform lambda_MLE, then z-score | Yes (lambda) | Partially (-55%) |
| C3 | log(1+x) + z-score | Log compression, then z-score | No | Partially (-35%) |
| C4 | Quantile-to-Gaussian | CDF^-1(F(x)) → exact N(0,1) | No | Yes (100%) |
| C5 | Clip 99th + z-score | Remove outliers, then z-score | No | Partially (-25%) |
Hypothesis: An ensemble of zero-cost metrics measured on labeled data can predict segmentation performance, enabling preprocessing selection without training. Adapted from NAS zero-cost proxies (architecture evaluation) to data evaluation — a novel application.
| Metric | What it measures | Cost | Reference |
|---|---|---|---|
| FDR | Static inter-class separability: (mu1-mu2)²/(var1+var2) | Minutes | Fisher 1936 |
| Gradient SNR | Gradient signal strength per class (1 backward pass) | ~15 min / method | Proche GraSP (ICLR 2020) |
| NASWOT | Diversity of ReLU activation patterns (log|K| Gram matrix) | ~15 min / method | Mellor (ICML 2021) |
| Linear Probe | F1 of logistic regression on random encoder features | ~15 min / method | Standard ML |
| Rank Aggregation | Average rank across all metrics | 0 sec | AZ-NAS (CVPR 2024) |
| Rank | Method | FDR (R) | SNR (R) | Loss (R) | NASWOT (R) | Probe (R) | Ensemble |
|---|---|---|---|---|---|---|---|
| 1 | C3 log(1+x) | 2.452 (3) | 0.801 (2) | 2.424 (3) | -282 (4) | 0.956 (1) | 2.6 |
| 2 | C2 Box-Cox | 3.430 (1) | 0.801 (1) | 2.408 (2) | -284 (5) | 0.918 (5) | 2.8 |
| 2 | C5 Clip 99th | 2.959 (2) | 0.800 (3) | 2.425 (4) | -278 (3) | 0.956 (2) | 2.8 |
| 4 | C4 Quantile | 1.703 (5) | 0.799 (5) | 2.406 (1) | -245 (1) | 0.946 (3) | 3.0 |
| 5 | C1 z-score | 1.704 (4) | 0.800 (4) | 2.433 (5) | -277 (2) | 0.943 (4) | 3.8 |
| Proxy | Winner | Loser | Interpretation |
|---|---|---|---|
| FDR | C2 Box-Cox | C4 Quantile | Box-Cox best separates classes in intensity space |
| Gradient SNR | C2 Box-Cox | C4 Quantile | Consistent with FDR — gradient follows static separability |
| NASWOT | C4 Quantile | C2 Box-Cox | Quantile creates most diverse activation patterns |
| Loss | C4 Quantile | C1 z-score | Quantile gives best initial optimization landscape |
| Linear Probe | C3 log | C2 Box-Cox | Log features are most linearly separable in random features |
FDR/SNR and NASWOT/Loss give opposite rankings. This demonstrates that static separability (intensity space) and network-level separability (activation space) are different properties. The ensemble rank integrates both perspectives.
Three levels of evaluation, from fastest to most reliable. Each level validates the previous. If rankings are consistent across levels, proxy metrics can replace training for preprocessing selection.
| Level | Method | Time | What it measures | Status |
|---|---|---|---|---|
| L1 | Proxy metrics (FDR, CV, skew) | 5 min / 5 methods | Static class separability | Done ✓ |
| L2a | Gradient SNR (956 training patients) | ~20 min / 5 methods | Gradient signal per class | Done |
| L2b | NASWOT (log|K| Gram matrix) | ~20 min / 5 methods | Activation diversity | Done |
| L2c | Linear Probe (random features) | ~20 min / 5 methods | Feature separability | Done |
| L2d | Rank Aggregation (AZ-NAS style) | 0 sec | Combined proxy ranking | Done |
| L3 | Mini-training (5 epochs) | ~42 min / method | Early convergence speed | Done — proxy INVERTED |
| L4 | Optuna/BoTorch per-modality tuning | ~10-15h | Optimal preprocessing params | Planned |
| L5 | Full training (40+ epochs) | ~5.5h / method | Actual Dice performance | C1 done |
| Level | Ranking | Consistent? |
|---|---|---|
| L1 (FDR) | C2 Box-Cox > C5 Clip > C3 log > C4 ≈ C1 | Baseline |
| L2a (SNR) | C2 ≈ C3 > C5 > C1 > C4 | Partial match L1 |
| L2b (NASWOT) | C4 > C1 > C5 > C3 > C2 | ✗ contradicts L1 |
| L2c (Probe) | C3 ≈ C5 > C4 > C1 > C2 | ✗ contradicts L1 |
| L2d (Ensemble) | C3 log > C2 ≈ C5 > C4 > C1 | Novel ranking |
| L3 (5-ep) | C1 0.767 > C5 0.660 > C3 0.624 > C2 0.552 | INVERTED vs L2d |
| L5 (40-ep) | C1 z-score: Dice 0.867. Others: pending after HP tuning. | — |
If L1→L2→L3→L4 rankings are consistent, this validates FDR as a zero-cost proxy for preprocessing selection — a methodological contribution applicable beyond BraTS.
| Dice Rank | Run | Preprocessing | Val Dice 5ep | Proxy Rank | Status |
|---|---|---|---|---|---|
| 1 | C1 | z-score (baseline) | 0.767 | 5th (3.8) | Baseline winner |
| 2 | C5 | Clip 99th + z-score | 0.660 | 2nd (2.8) | Complete |
| 3 | C3 | log(1+x) + z-score | 0.624 | 1st (2.6) | Complete |
| 4 | C2 | Box-Cox MLE + z-score | 0.552 | 2nd (2.8) | Complete |
C1 z-score 40 epochs: EMA Dice 0.867, Val Dice 0.824. Others at 5 epochs only.
Each MRI modality measures a different physical property. The optimal preprocessing differs per modality — this is scientifically justified because the intensity distributions have different underlying causes.
| Modality | Best method | FDR | Why |
|---|---|---|---|
| T1 | Box-Cox | 2.683 | Power transform handles moderate right skew from WM/GM contrast |
| T1ce | Box-Cox | 3.209 | Compresses gadolinium enhancement peak, preserves tumor signal |
| T2 | Box-Cox | 4.113 | Strong skew from fluid — Box-Cox normalizes effectively |
| FLAIR | Clip 99th | 6.596 | Hyperintense lesions preserved by clipping, Box-Cox compresses them |
Proposed hybrid: Box-Cox for T1/T1ce/T2 + Clip for FLAIR. To be validated by training after uniform method validation.
Instead of choosing between fixed methods, optimize preprocessing hyperparameters per modality using zero-cost proxies as the objective function. Optuna/BoTorch explores the parameter space efficiently.
| Parameter | Method | Current value | Search range | Per modality? |
|---|---|---|---|---|
| lambda | Box-Cox | MLE (optimizes Gaussianity) | [-2, 2] | Yes |
| percentile | Clip | 99th | [90, 99.9] | Yes |
| shift | log(a+x) | a=1 | [0.01, 10] | Yes |
| clip + lambda | Clip + Box-Cox | 99th + MLE | Combined search | Yes |
Tested skull-stripping, bias field correction, histogram matching, and denoising on BraTS. Found negligible effect on Dice. Conclusion: InstanceNorm layers compensate for preprocessing differences by re-normalizing features internally at every network stage.
nnU-Net uses InstanceNorm3d at all 7 encoder/decoder stages. Each layer re-centers and re-scales the feature maps, potentially nullifying input distribution differences.
| Property | Durso-Finley transforms | Our transforms |
|---|---|---|
| Type | Linear / quasi-linear | Nonlinear |
| Examples | Skull-strip, bias field, histogram match | Box-Cox, log(1+x), quantile-to-Gaussian |
| What changes | Scale, shift, which voxels present | Distribution shape (skewness, kurtosis) |
| Moments affected | 1st (mean), 2nd (variance) | 3rd (skew), 4th (kurtosis) |
| InstanceNorm corrects? | Yes — normalizes mean+std | Not directly — higher moments persist |
Open question: does input skewness survive 7 layers of InstanceNorm + LeakyReLU + Conv3D, or do the nonlinearities reshape it regardless? This is what our experiment tests.
| Component | Choice | Justification |
|---|---|---|
| Architecture | nnU-Net v2 (PlainConvUNet, 7 stages) | SOTA reproducible, Nature Methods 2021 |
| Dataset | BraTS 2023 GLI (1196 patients, 4 modalities) | Multi-institutional, recent |
| GPU | NVIDIA RTX PRO 6000 Blackwell (96 GB VRAM) | Full-brain patches 160x256x256 |
| Precision | BF16 (no GradScaler) | Native Blackwell sm_120 |
| Optimizer | SGD Nesterov, lr=0.01, PolyLR | nnU-Net default |
| Evaluation | Dice (WT/TC/ET), HD95, Wilcoxon | BraTS challenge standard |
| Viewer | Next.js + FastAPI + Three.js | Clinical interface + research dashboards |
Project: BraTS Brain Tumor Segmentation — independent research
Data: BraTS 2023 GLI Challenge (1196 patients, multi-institutional)
Framework: nnU-Net v2 (Isensee et al., Nature Methods 2021)
Last updated: 2026-03-26