We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.

Advertisement

Inside the UK Biobank’s Push Toward Population-Scale Proteomics

Abstract digital figure made of blocks and molecules representing proteomics and human biology.
Credit: iStock.
Read time: 5 minutes

As large-scale proteomics moves from experimental promise to practical reality, few initiatives illustrate its potential and complexity better than the UK Biobank.


The resource’s recent release of proteomic biomarker data from more than 50,000 participants marks a milestone in its effort to build one of the world’s most comprehensive multi-omics datasets. With plans now underway to scale Olink proteomics to all 500,000 participants, a high-throughput platform that measures hundreds to thousands of proteins simultaneously, the UK Biobank is aiming to understand disease biology at a population scale.


To understand what this expansion means for the research community, the challenges of integrating high-dimensional data and the emerging paths toward clinical translation, Technology Networks spoke with Dr. Adam Lewandowski, the deputy chief scientist for the UK Biobank and an associate professor in the Nuffield Department of Population Health at the University of Oxford, at HUPO 2025.


In this conversation, he discussed the scientific and logistical hurdles of releasing large-scale proteomic data and how the UK Biobank is preparing researchers to navigate the rapidly evolving multi-omics landscape.

Rhianna-lily Smith (RLS):

The UK Biobank recently added proteomic biomarker data from over 50,000 participants. What were the main scientific and logistical challenges in integrating such a large-scale proteomics dataset into the existing biobank infrastructure?


Adam Lewandowski, PhD (AL):

With the UK Biobank initial proteomics project, there's a lot of new data that's coming to the research community, including data that they may not have been familiar with using in the past.


One of the challenges with that is creating the appropriate documentation, as well as creating the appropriate training materials to make sure that researchers understand how best to use those data types.


The big advantage of where we started is that we had this close collaboration with the Pharma Proteomics Project. This was a collaboration of 13 pharma partners that came together in November 2020, which formed to be able to look at proteomics data using the Olink platform. Through that process, they did a lot of the QC data cleaning work, pre-creating the appropriate metadata and also the documentation to work closely with the UK Biobank team once the data were available, so that the research community could make use of that data in the best way possible.

There are always ongoing challenges in terms of new researchers and researchers from different areas. But that's one of the nice things about UK Biobank; this is a truly global resource with researchers around the world who are willing to support and help each other.

We're continuously looking at ways of improving those training programs and the documentation to support the research community.



RLS:

Is there potential for repeated proteomic sampling within UK Biobank participants to track dynamic protein expression changes over time? What value could that add to longitudinal disease studies?


AL:

Earlier this year, the plan started to come together to be able to do Olink proteomics on the full cohort.


We're happy to announce that a couple of months ago, we received notification from the UK government that they were match funding the pharma proteomics project. We now have 14 pharma partners, with potentially a few more joining as well, that have helped bring this together. With the government funding, we can do Olink proteomics with the 5400 proteins on the full 500,000 participants, as well as repeat samples that were carried out over multiple years since the initial setup of UK Biobank, nearly 20 years ago.


What this means is there'll be 600,000 samples in total, 100,000 from those repeats that were taken at different intervals over time.



RLS:

How close are we to seeing proteomic signatures move from research discovery into clinical risk assessment or precision medicine applications?


AL:

We started to see, in the first year with the UK Biobank, people developing new tools of predictive models coming up with ways of predicting incident cases of dementia, looking at multiple myelomas, looking at other illnesses, as well as coming up with new therapeutic targets for Parkinson's disease, cardiovascular diseases and breast cancer. We've also seen new biomarkers for things such as multiple sclerosis.

So, the opportunity is definitely there. We've now seen hundreds of publications coming out of this initial stage of the Olink proteomics work.

We've also seen one of the different groups starting to look at predictive models and how that compares to clinical risk scores. In the last few years, with the UK Biobank, we've seen the development of polygenic risk scores and how genetics can be integrated into clinical practice to be able to enhance those clinical risk prediction algorithms.

Research groups such as Prof. Claudia Langenberg's group in London have taken that a step further and now incorporated those clinical risk factors, polygenic risk scores, as well as different proteins from the Olink platform, to look at how that predicts later diseases. They've shown how the proteomics information allows for better prediction of ~67 different diseases.

What that means is that not only can we start to potentially predict diseases earlier, but also predict with much higher certainty.

As we scale up the proteomics programs, not just with Olink, but with other areas of proteomics, we're going to start to see more and more development of those opportunities and understanding from the UK Biobank, as well as translation to testing within clinical settings and testing what the value could be for enhancing downstream patient care.



RLS:

What are the next major steps for expanding UK Biobank’s proteomics program – are there plans to scale up beyond 54,000 participants or integrate with metabolomics and lipidomics datasets?


AL:

As I mentioned, we've got the Olink now being extended to the full cohort, as well as the repeat samples.


We've also got the complementary Soma platform, which we'll be doing on ~55,000 samples, or 55,000 participants, including baseline measures as well as some of the repeats; so it allows for those longitudinal changes, looking at dynamic protein changes over time.


The other aspect is that we're looking at what we're not capturing within those platforms. The platforms are fantastic in terms of standardization and scalability, but there are potentially other aspects, such as lower detection proteins within the neurodegenerative space, including different tau proteins such as p-tau181 and p-tau217. We’re considering how to better capture those lower-abundance proteins in the population.


That's where we're looking towards other platforms, such as the Alamar CNS panel, to incorporate those measures within the resource. We’re also looking at what other things we can do in terms of protein isoforms and incorporating mass spectrometry.

All of these things require additional funding, but I think there's enough enthusiasm, definitely within the research community and within our different partners and expert advisors, to make this happen.

We are working very hard to make that data available.


We'll continue to expand the proteomic side, but we also have to remember that we already have the existing whole genome and whole exome sequencing data, and we're enhancing this with the addition of the nuclear magnetic resonance metabolomics data from Nightingale Health.


With that data now becoming available for half a million participants, as well as repeat samples in the cohort and also incorporating in 50,000 participants the Oxford Nanopore technology or the long read sequencing, and looking at how we can also include other assays, such as short read sequencing using the elliptical platform, immunophenotyping and RNA sequencing; that really provides us an opportunity to start to look at omics across multiple steps and more broadly as well.


When we get all of that data together, the big challenge is always, how do you get the research community trained and able to work with it, so that will be a collaboration between the UK Biobank and our experts in the community.

I think it's an exciting time ahead for the UK Biobank, with lots of new data coming over the next couple of years to enhance the space.


RLS:

Do you think there are educational or awareness challenges around this new technology?


AL:

I think the reality is that proteomics is, in many ways, a newer field because of the challenges in the past of scalability. Genomics is ahead in many ways and there's more awareness as to what the potential value of that data is. But I think now we're starting to see biomarker and potential drug target discovery, and the clinical translation value of proteomics.


There's still work to be done to understand the different platforms and the different ways of analyzing the data, and what the value is of having these complementary data sets too.


This is one of the things that I’ve been speaking to the people within HUPO about, as well as thinking about how we can leverage the expertise of HUPO to be able to help build that information, to go to funders to say: “Well, why do we need to do this? What is the potential value? What is the potential value of this data for drug discovery, for biomarker discovery and ultimately, for improving clinical practice in the future?”