Multimodal fusion of 11 satellite and agricultural data streams predicts US county-level life expectancy — no census, no surveys — purely from features observable from orbit across 3,108 counties over 20 years.
ee.Image.reduceRegions() at 250 m scale. FAO livestock densities linearly interpolated between snapshot years (2005/2010/2015). Alaska, Hawaii, and US territories excluded — 860 rows removed from the 62,540-row raw dataset, leaving 61,680 CONUS county-year observations.n_estimators=2000; max_depth=40; max_features=0.3; max_samples=0.9; min_samples_leaf=2. Multimodal fusion reduces unexplained variance by 32% over best single-modality model (LST: R²=0.442). Moran's I=0.08 (p=0.23) confirms zero residual spatial autocorrelation.joblib — over 97 CPU-hours on a 64-core Juno node. LOWESS (bandwidth 0.20–0.40 adaptive) plus Savitzky-Golay smoothing applied to SHAP dependence curves; numerical differentiation on a 300-point grid identifies inflection points. Nighttime LST: Σ|SHAP|=0.860 yr vs daytime 0.140 yr — a 6.16× ratio. SHAP rankings corroborate RF impurity importance (Spearman ρ=0.89).Computed during combined-model preprocessing only — not used in single-modality ablation models. These capture synergistic interactions between sensors that no individual stream can encode alone. The full set achieves R²=0.581 alone, surpassing all 84-feature optical modalities and the 135-feature USDA CDL (R²=0.433), despite using 5.5% of the feature count.
| # | Feature Name | Construction | Mechanistic Rationale | Domain |
|---|
425 base features + 25 derived = 450 total.
The most unexpected finding: chronic minimum overnight cooling opportunity — not extreme daytime heat — is the dominant predictor of county-level longevity. NCE (Nighttime Cooling Efficiency) is the single most important feature in the entire 450-feature dataset.
| Modality | Features | R² (mean ± SD) | MAE (yr) |
|---|---|---|---|
| Combined (193 pruned) | 193 | 0.631 ± 0.013 | 1.08 |
| Combined (full 450) | 450 | 0.623 ± 0.013 | 1.09 |
| Engineered ✦ | 25 | 0.581 ± 0.015 | 1.16 |
| MODIS LST | 14 | 0.442 ± 0.021 | 1.36 |
| USDA CDL | 135 | 0.433 ± 0.011 | 1.38 |
| FAO Livestock | 42 | 0.396 ± 0.012 | 1.44 |
| MODIS NDVI/EVI | 14 | 0.350 ± 0.029 | 1.49 |
| Sentinel-2 | 84 | 0.079 ± 0.004 | 1.85 |
| JRC Water | 6 | 0.066 ± 0.009 | 1.88 |
✦ 25 engineered features surpass all 84-feature optical modalities at 5.5% of the feature count.
SHAP derivative analysis (LOWESS + Savitzky-Golay + numerical differentiation on a 300-point grid) identifies sharp inflection points where the direction or rate of health impact changes qualitatively. Every threshold is directly measurable from existing freely available satellite archives.
This study demonstrates that satellite Earth observation alone — with zero census inputs — can predict life expectancy for every contiguous US county with R²=0.631 and MAE=1.08 years. Operating on 61,680 county-year observations from 2000–2019, the model achieves ≈78% of the performance of IHME's gold-standard sociodemographic model, which relies on income, education, and healthcare data that most of the world cannot provide. Two independent HPC runs confirmed ΔR²<0.001 — full bit-identical reproducibility.
The most important scientific finding is the Nighttime Thermal Paradox: chronic minimum overnight temperature — not extreme daytime heat — is the dominant predictor. Nighttime LST features carry 6.16× the cumulative SHAP attribution of all daytime channels (0.860 yr vs 0.140 yr). NCE alone has mean |SHAP|=0.309 yr — roughly 14× larger than the equivalent daytime feature (0.022 yr). This reorients the heat-health literature from an almost exclusive focus on peak daytime temperatures toward chronic nocturnal burden.
A striking secondary result: 25 engineered cross-modal features achieve R²=0.581 alone, surpassing every individual raw modality including the 135-feature USDA CDL (R²=0.433). Physics-grounded feature construction concentrates predictive signal more efficiently than raw spectral accumulation.
Three further policy-actionable thresholds: cattle density inflects at ≈318 head/km²; deciduous forest cover >20% attenuates the heat-driven LE penalty by ≈38% in the hottest quartile; and a soil moisture Goldilocks zone at SM=6–8 encodes the agricultural productivity–flood risk trade-off.
The model's systematic over-prediction in 9 Indigenous reservation counties (MAE≈7.1 yr) is not model failure — it is a diagnostic flag for where structural determinants invisible to any satellite sensor (chronic IHS underfunding, historical trauma) vastly outweigh environmental ones. A hybrid-surveillance design augmenting these counties with locally collected vital statistics is recommended for operational deployment.
Code: github.com/albertfaiz/Multimodal_geo_fusion_FM · Data: Zenodo doi:10.5281/zenodo.19229752 · Journal: MDPI Remote Sensing