Using 3D Multiomics To Reveal the Hidden Architecture of the Genome
Discover how 3D multiomics enables genetic validation of disease-relevant targets.
The completion of the Human Genome Project in 2003 was celebrated as the start of a new era in medicine. Genome sequencing can now be completed in a single day at a fraction of the cost, enabling the generation of vast amounts of genetic data.
To unlock the potential of genomics in precision medicine, it's important to consider the genetic code as more than linear, but as a dynamic and complex 3D structure. When combined with multiomics, the 3D structure of the genome can provide a more complete map of biology, which is essential for uncovering causal mechanisms in disease.
Enhanced Genomics is doing just this, utilizing its proprietary 3D multiomics platform and data-rich cell atlas to unlock the value of decades worth of sequencing data, connecting thousands of genetic variants to disease-driving genes. Technology Networks caught up with Dr. Daniel Turner, chief scientific officer at Enhanced Genomics, to learn more about how understanding the 3D structure of the genome can accelerate target identification.
Can you walk us through how Enhanced Genomics’ 3D multiomics platform works and what makes it distinct from existing approaches in the field?
Enhanced Genomics is pioneering a new way to read the human genome. Rather than seeing it as a linear structure, we investigate its 3D structure. This reveals the hidden architecture of the genome and the biological circuits that drive disease, offering the potential to transform our understanding and treatment of illness.
For the past 20 years or so, geneticists have studied complex diseases using genome-wide association studies (GWAS). These approaches are great at identifying variants that a person might have that predispose them to disease, or that might alter disease severity or onset. However, they are not good at identifying which genes are involved in these disease processes and in which cell types.
The reason for this is that most GWAS variants are found in non-coding regions of the genome, making their function hard to decipher. When you have a variant in the coding region of a gene, you can expect it to exert any pathogenic effect by changing the sequence (and hence potentially the structure) of the protein that this gene codes for. But with non-coding variants, the disease-causing mechanisms are much less apparent. They are typically found in regulatory regions of the genome and exert their effect by changing the amount of the protein rather than its structure. So, the thing you’re looking for is the same between patients and healthy individuals; there’s just a different amount of it, and the quantitative differences are not always large.
To make matters worse, non-coding variants can exert their effect over long distances. That’s to say, you can have a disease-associated single-nucleotide polymorphism (SNP), a megabase or more away from the gene it influences, and there can be other, unaffected genes in between the SNP and the relevant gene promoter. You can’t just look at the linear sequence of the genome and figure out which SNP influences which gene. I mean, you can try, and this is the approach that pharma companies typically use to identify disease-relevant genes, but you’re wrong about half of the time. It means that not only do you miss key disease-relevant genes, but you can also misassign genes, too, which leads to time and money being spent chasing false leads.
At Enhanced, our approach is based on the fact that in a cell nucleus, the genome isn’t linear – it must be folded up so the 2 meters of DNA can fit into a 6-micrometer nucleus. But the way the genome folds differs from one cell type to another, even in the same individual. This brings parts of the genome into proximity that were far apart in the linear sequence. It’s this folding that brings regulatory regions into contact with the promoters of the genes they control, allowing for cell-type-specific patterns of gene expression. Variants in these regulatory regions will affect the expression of their target genes, and that’s why the GWAS variants have the effect they do.
The reason Enhanced’s approach is so powerful is that it identifies direct physical interactions between disease-associated variants and the genes they influence. This not only allows us to identify the relevant genes in the relevant cell types, but it also provides genetic validation of the genes identified.
Traditional approaches to identifying drug targets/disease genes involve taking GWAS variants and then looking along the linear genome sequence to find likely candidates. The closest gene is often taken to be the gene of interest, especially if omics data support that hypothesis. However, the problem here is that omics data is notoriously difficult to integrate accurately, and again, you’re dealing with associations, rather than getting evidence that validates the hypothesis.
You might find relevant target genes using traditional, proximity-based approaches, but you tend to miss a lot of potential target genes.
Not only this, but you also generate a lot of false positives this way. This results in the need for a lot of downstream validation to eliminate false positives.
In a recent collaboration with the Alborada Drug Discovery Institute (ADDI), which is interested in Alzheimer’s disease, we reduced the amount of time taken to get from the GWAS SNP set to a shortlist of target genes from around two years to two months. Not only were we able to identify the same genes that this traditional approach had uncovered, but we added additional genetic support to many of them, meaning that ADDI could prioritize their shortlist more effectively. We discovered a substantial number of additional target genes that were invisible to their traditional approach.
Because our technology provides genetic validation as an inherent part of the process, we can be very confident in taking some of these targets forward for further validation, even if there is not a huge amount of supporting literature.
Integrating omics data represents a major challenge, as the results can be inconsistent with one another. In addition to analyzing 3D genomic folding, Enhanced also generates the typical omics data types for each cell type that we profile, such as RNA-seq, chromatin accessibility and histone modification. We utilize the 3D folding information as a kind of scaffold, which helps us to integrate and make sense of the other omics data more effectively. We observe the same thing repeatedly: examining other omics data can lead you to conclude that a particular gene is involved in disease susceptibility, but when you layer in the 3D folding information, the picture becomes very clear. It either strengthens the hypothesis you formed from the other omics data, or it explains why that was incorrect and it shows you the right answer.
Data generation is often a significant challenge, most commonly when you are analyzing genomes from large numbers of healthy individuals vs patients, because of the number of samples and the volume of data this all amounts to. At Enhanced, we focus mainly on profiling cell types from healthy individuals. We want to see how the genome folds in the absence of disease. Then, when we get patient data in the form of GWAS results, we can see which of the expected healthy long-range interactions are likely to be perturbed in some way by the variants. We don’t necessarily expect to see changes in the folding patterns between healthy individuals and patients. The signals we’re looking for are much more subtle. As we don’t really expect to find significant differences in 3D folding patterns between patients and healthy individuals, there’s less reason to profile patients.
When we profile the same cell types from different individuals, the folding patterns that we see differ markedly from one cell type to another, but very little between the same cell types from different individuals. The folding of cell types is very reproducible from one individual to the next, so when profiling a cell type, we typically look at four individuals, and we find a high degree of concordance. We also generate other omics data as well as 3D folding info.
In certain situations (e.g., if we wanted to elucidate the mechanism of action for several drugs simultaneously), we would benefit from running the pipeline in a higher-throughput mode. We have an advanced tech dev program that is addressing this. However, for target discovery, the throughput of our existing pipeline is already good enough. Once we’ve profiled a cell type, that is then “done”, and we add the data to our cell atlas database. It means that we don’t need to generate that dataset again – rather, we can query this and all the other data in our atlas with a new GWAS dataset very quickly since the omics data has already been generated.
The tech we’ve developed is essentially disease agnostic. If we have profiled the cell types relevant to the disease of interest, then we can query the atlas and identify target genes. However, we don’t have all human cell types in the atlas ... yet!
We do, however, have highly skilled and experienced lab and bioinformatics staff and an efficient pipeline, so we can theoretically profile any cell type and add that to the atlas. Currently, our indication choice is governed by what is in the cell atlas. These are primarily immune cell types, but we also have a few brain cell types. It makes sense for us to start with immune cells because all diseases have some kind of immune aspect to them, and some are entirely driven by immune cells.
Other than atlas content, for an indication to make sense for us to turn our attention to, there needs to be strong evidence that genetically backed targets had delivered effective drugs in the past, there would be high quality GWAS data, easy to access in vitro and in vivo models for initial validation work, a short timeline from drug discovery to drug on shelves and of course, a big market for the drug and high unmet need.
The additional Series A funding will allow us to complete several key pieces of work and to help us put a compelling Series B package together. For this, we have focused on inflammatory bowel disease. We generated a longlist of candidate target genes and completed extensive triaging of that list to get the number down to a much smaller number of very high-confidence genes. These are ones that we’re taking forward into our screening and in vitro/in vivo validation work.
In the longer term, we are keen to explore some related applications of the technology – such as the ability to identify which subset of patients is more likely to respond to a given treatment.
We remain open to strategic investment and opportunities for partners, both investors and collaborators, to join us in advancing the transformative potential of our 3D multiomics platform and to help build the next generation of precision medicine.