Major depressive disorder (MDD) is one of the most prevalent and debilitating psychiatric conditions. This study applied a novel multimodal fusion method (tIVA) to integrate resting-state fMRI (ALFF) and structural MRI (gray matter density) in 461 participants. The findings revealed a joint disruption linking cerebellar hypoactivity and salience network structural deficits, alongside default mode hyperactivity and executive control network hypoactivity as unique features of MDD. A machine learning Random Forest model achieved nearly 70% classification accuracy, confirming the predictive value of these multimodal components. By uncovering structural–functional interactions in key networks, this work advances the search for neurobiological biomarkers of depression and informs precision psychiatry approaches.