Every cell in your body, whether it’s a brain cell, skin cell, or liver cell, carries the same genetic information. Yet, despite having the identical genetic sequence, each cell expresses only a subset of those genes, leading to the variety of specialized cell types that make up your body. How do cells differentiate? The answer lies partly in the three-dimensional (3D) organization of the genome — the spatial arrangement of the DNA within the cell nucleus. This intricate structure controls which genes are accessible for activation, thus determining a cell’s specific function. However, the challenge has been to analyze and understand this highly dynamic and complex structure.
A breakthrough from MIT chemists offers a new way to uncover these 3D genome structures using generative artificial intelligence (AI). The researchers developed a model that can predict the 3D structure of the genome in a fraction of the time it takes using traditional experimental techniques. With this new tool, researchers can gain insight into how the 3D organization of the genome influences gene expression and cellular function, potentially advancing our understanding of cellular biology and disease mechanisms.
The groundbreaking research was published in the journal Science Advances and represents a leap forward in both computational biology and AI applications.
Understanding the Genome’s 3D Organization
To understand the significance of this study, it’s essential to appreciate the complexity of the genome’s organization inside the cell. DNA, which carries our genetic instructions, is packaged into chromatin inside the cell’s nucleus. Chromatin is made up of long strands of DNA wrapped around proteins known as histones, forming a structure often referred to as “beads on a string.” This compact packaging is crucial because it allows the DNA, which is around 2 meters long, to fit inside the tiny nucleus of a cell, which is only about 100 micrometers in diameter.
The organization of chromatin isn’t static. It’s dynamic and highly influenced by epigenetic modifications—chemical tags that are attached to the DNA at specific locations. These tags vary by cell type and play a crucial role in regulating how tightly the chromatin is wound, determining which genes are accessible for activation. Essentially, the spatial folding of chromatin helps decide whether a gene is turned on or off, shaping the unique identity of a cell. For instance, a brain cell and a skin cell both have the same genetic code but express different sets of genes based on the chromatin’s 3D conformation.
Over the past two decades, scientists have developed techniques like Hi-C to study chromatin’s structure. In Hi-C, DNA strands in a cell’s nucleus are linked together, and scientists can then figure out which segments are located near each other. This method helps create a map of the chromatin’s 3D structure, revealing how different sections of DNA fold in relation to one another. Although effective, Hi-C and other experimental methods are time-consuming, requiring labor-intensive procedures that can take a week or more to produce a single dataset.
A Revolutionary AI-Based Approach
The challenge of mapping chromatin’s 3D structure, especially for individual cells, is monumental because of the complexity involved in predicting how a sequence of DNA will fold. Fortunately, recent advances in artificial intelligence have opened the door for new, faster solutions.
The team of MIT chemists, led by Bin Zhang, an associate professor of chemistry, leveraged the power of generative AI to create a tool that could predict chromatin structures much more quickly than traditional methods. Zhang’s team developed a model called ChromoGen that uses deep learning to analyze the DNA sequence of a gene and predict the 3D structure that results from that sequence. ChromoGen stands apart because it doesn’t just predict a single structure; it generates many possible conformations, recognizing that DNA, due to its inherent flexibility, can fold in multiple ways.
Zhang and his students, Greg Schuette and Zhuohan Lao, explain that the key to the model’s success is its ability to process vast amounts of data, identify patterns, and make accurate predictions. The model is trained on 11 million chromatin conformations, which were obtained using a variant of Hi-C called Dip-C. The training data was sourced from human B lymphocytes, a type of white blood cell, offering a diverse range of chromatin configurations to learn from.
The AI model has two main components:
- A deep learning model that “reads” the genome by analyzing DNA sequences and chromatin accessibility data (which indicates how open or closed the chromatin is at specific sites, and varies between different cell types).
- A generative AI model that uses this data to predict physical chromatin conformations. The generative model can simulate the various ways the chromatin might fold in response to the DNA sequence and environmental factors.
Speed and Accuracy
The most significant advantage of ChromoGen over traditional methods is its speed. Once the model is trained, it can predict thousands of chromatin structures in a matter of minutes. Schuette highlights the efficiency of the model, noting that whereas experimental techniques like Hi-C might take six months to generate a few dozen structures for a given cell type, ChromoGen can generate a thousand structures in just 20 minutes on a single graphics processing unit (GPU).
This breakthrough dramatically reduces the time and cost of studying chromatin structures. Researchers can now rapidly analyze the 3D conformation of DNA sequences in individual cells, providing a wealth of data that would have been impossible to gather using traditional experimental methods.
To validate their model, the researchers applied ChromoGen to more than 2,000 DNA sequences, comparing the model’s predictions to experimentally derived structures. The results were striking: the model accurately predicted chromatin conformations that were nearly identical to those seen in experimental data. This confirmed that the AI model could not only predict structures from sequences it had been trained on but also generalized well to new data, including data from different cell types.
The model’s accuracy at predicting chromatin structures in cells it hadn’t been trained on suggests that it can be used to explore how chromatin organization differs between cell types. Understanding these differences could help explain how cells with the same genetic code can perform entirely different functions. The model could also offer insights into the chromatin states within a single cell, potentially revealing how gene expression is regulated at a more granular level.
Potential Applications
The researchers believe that ChromoGen could be used to investigate several important biological and medical questions. One exciting possibility is using the model to explore how genetic mutations affect chromatin structure, which could provide valuable insights into disease mechanisms. For instance, if a mutation alters the conformation of the chromatin in a particular gene, it could affect how that gene is expressed, potentially leading to disease. ChromoGen could help researchers identify these structural changes more quickly than ever before.
Another important application is in understanding how different cell types regulate gene expression through chromatin structure. By studying how chromatin folds in different types of cells—such as muscle cells, nerve cells, or immune cells—scientists could gain insights into the molecular basis of cellular specialization. The model could also be applied to investigate how chromatin structure changes during development or in response to environmental signals, providing a new perspective on gene regulation.
Moreover, ChromoGen has the potential to significantly impact the field of personalized medicine. By predicting how mutations affect the chromatin structure in an individual’s DNA, the model could help identify potential biomarkers for disease and assist in the development of targeted therapies.
Making the Model Available
In an effort to accelerate scientific progress, the MIT team has made their ChromoGen model and data publicly available. Researchers around the world are now able to use the model to study chromatin structures in their own cell types of interest, enabling a wide range of new discoveries.
Conclusion
The development of ChromoGen is a major step forward in both the study of genomics and the application of AI to biological research. By combining deep learning and generative AI, the MIT team has created a tool that allows researchers to quickly and accurately predict the 3D structures of chromatin, offering unprecedented insights into how cells control gene expression. With its potential to transform our understanding of cellular biology, disease mechanisms, and personalized medicine, ChromoGen represents a groundbreaking innovation in genomics and artificial intelligence.
Reference: Greg Schuette et al, ChromoGen: Diffusion model predicts single-cell chromatin conformations, Science Advances (2025). DOI: 10.1126/sciadv.adr8265