Mingyan Fang
Research Area: Genomics, Computational Biology
Professor and Principal Investigator, BGI-Research, Shenzhen
Professor and Principal Investigator specializing at the intersection of Genomics and Artificial Intelligence, with a Ph.D. in Clinical Immunology from Karolinska Institutet, Sweden. Over 15 years of experience in computational biology and translational medicine, dedicated to deciphering the mechanisms of immune diseases through innovative AI/ML approaches and large-scale multi-omics integration.
Research Highlights
AI/Algorithm Innovation
Developed VIPPID (first specialized Primary Immunodeficiency variant predictor), GeneRAIN (Transformer-based GRN model), and VIPER (LLM for genetic disease gene identification).
Disease Mechanism Research
Systematically identified 50+ causative genes for human disease (Inborn error of immunity, autoimmune, neuroimmune, etc.), elucidating pathogenic mechanisms.
Large-Scale Data Platforms
Developed the ZBOLT genomic analysis platform with a capacity of 100 Tbp/day, enabling ultra-large-scale population studies.
Selected Publications
Representative works from recent years. See the full list on the Publications page.
Genome sequencing of 7,140 newborns reveals hidden genetic disease burden
M. Fang#*, Y. Huang*, Y. Mei*, X. Jia*, Y. Gao*, X. Wang, Y. Sun, Y. Zeng, W. Huang, L. Zhu, Z. Duan, Y. Xie, X. Jiang, H. Zeng, J. Tang, X. Qian, Z. Li, Y. Yi, G. Zhang, Y. Huang, C. Liu, G. Huang, W. Zeng, B. Wang, Y. Miao, Y. Bai, H. Huang, Y. Xiao, J. Liu, X. Xu, S. Pan*, L. Hammarström*, F. Chen*, X. Jin*. Advanced Science, under first-round review (IF: 14.1, CAS Q1)
Large-scale newborn genome sequencing study revealing hidden genetic disease burden and establishing foundation for population-level genomic screening.
GeneRAIN: multifaceted representation of genes via deep learning of gene expression networks
Z. Su*, M. Fang*, A. Smolnikov, ME. Dinger, EC. Oates, F. Vafaee. Genome Biology 26, 288 (2025). (IF: 9.4, CAS Q1)
Novel deep learning framework for gene representation learning using 777K bulk transcriptomes, advancing AI-driven genomics research.
Age-Related Dynamics and Spectral Characteristics of the TCRβ Repertoire in Healthy Children: Implications for Immune Aging
Fang M#*, Y. Miao#, L. Zhu, Y. Mei, H. Zeng, L. Luo, Y. Ding, L. Zhou, X. Quan, Q. Zhao, X. Zhao, Y. An#. Aging Cell, 2025.
Characterizes age-related dynamics in the TCRβ repertoire of healthy children, informing immune aging mechanisms.
An efficient large‐scale whole‐genome sequencing analyses practice with an average daily analysis of 100Tbp
Z. Li*, Y. Xie*, W. Zeng*, Y. Huang, S. Gu, Y. Gao, W. Huang, L. Lu, X. Wang, J. Wu, X. Yin, R. Zhu, G. Huang, L. Lu, J. Tang, Y. Zheng, Q. Liu, X. Zhou, R. Shan#, B. Wang#, M. Fang#, X. Jin#. Clinical and Translational Discovery, 2023. (IF: 1.9)
Development of ZBOLT platform enabling ultra-large-scale genomic data processing at 100 Tbp/day capacity.
VIPPID: a gene specific single nucleotide variant pathogenicity prediction tool for Primary Immunodeficiency Diseases
M. Fang*, Z. Su*, H. Abolhassani, Y. Itan, X. Jin, L. Hammarström. Briefings in Bioinformatics, bbac176 (2022). (IF: 13.994, CAS Q1)
First specialized variant pathogenicity predictor for Primary Immunodeficiency Diseases, revolutionizing clinical genetic diagnosis.