We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.

Advertisement

Data Compliance: Can AI Help Users Navigate New Technologies and Regulations?

Digital interface with icons for policies, regulations and law highlighting data compliance concepts.
Credit: iStock
Read time: 4 minutes

Adhering to data compliance standards requires labs to keep their finger on a shifting pulse. Users must be aware of new regulations, changing data standards and the implications of altering experimental protocols and procedures.


New technologies, spearheaded by the rise of artificial intelligence (AI), promise researchers a better view of data sources and the automation of time-consuming compliance processes. At the same time, these technologies are increasing the throughput and depth of available data, which can create new burdens and requirements for labs. In this article, we examine how these advances are transforming compliance and how users can stay informed.

The everyday challenges of compliance

Colin Thurston, a business development manager at informatics consultancy Scimcon, told Technology Networks that many of his clients are wrestling with two key data compliance issues. One of these problems affects every lab procedure, even the simplest task of weighing a sample. The challenge is this: how do we ensure that recorded data is accurate?


In theory, this task is straightforward; a scientist places a sample on a scale and records the measurement. But designing your protocols to minimize error and ensure accuracy isn’t as simple. “Do you get someone to write it down on a notepad and then type it in again?” said Thurston. “Or do you plug in some electronics and then automatically capture that?” When using technology to record a data value, it is essential to verify that the data is concordant and that you have an accurate genealogy of your data, Thurston explained. Data can be lost or altered as it is passed between multiple lab informatics systems, so it’s important to ensure your different setups play nicely with each other and communicate seamlessly to maintain compliance.

The second compliance issue Thurston highlighted is the increasing move toward reusing data. Bioscience has built up decades of data, and advances in big science and omics disciplines have increased the rate of data generation by magnitudes. What’s changed in the last few years is the availability of tools that can process that data, which for too long has remained stuck in siloes.


AI tools, which can rapidly and accurately analyze huge datasets, are now being brought to bear in lab settings, but the quality of legacy data can be inconsistent. Thurston said he encourages his clients to ensure their data meets FAIR principles, particularly those of interoperability and reusability, to maximize the data’s utility to AI and machine learning toolsets. “A lot of the issues come from data in the lab that is not always consistent,” said Thurston. Manufacturers’ variable data standards and data that isn’t in a machine-readable format are roadblocks to maximizing the value of data resources.

Improving data compliance with lab informatics

The solution to these issues is both procedural and technological, says Thurston. Researchers should avoid overwhelming themselves by trying to format massive datasets; instead, they should prioritize the most valuable data resources. Thurston also recommends adopting standardized data formats, like the markup language XML, which adds tags to data that can be easily read by humans and machines. “That is going to be far easier for you to then reuse down the line than if you're trying to store just native data formats,” he said.



Advertisement

New technologies can help in these efforts. Modern scientific data management systems usually store data in its original format, but will also convert it into a machine-readable version. This ensures that labs can process datasets while also meeting regulatory standards regarding audit trails. AI tools that utilize natural language processing will aim to make sources of information, like heterogeneous medical records, reliably accessible to researchers.


When utilizing AI-based analysis of datasets, users must keep in mind the relevance and coverage of their data. “Historically, research organizations have tended to focus very much on positive results and less so on negative ones,” said Thurston. The bias against negative or null findings within published research, as well as within in-house analyses, has been well-documented. Scientists using these unrepresentative datasets to feed algorithms will only further these biases. Instead, users should consider running experiments through to their conclusions and recording results, even if a negative outcome is apparent. If this data is then fed into models alongside positive data, the process will lead to better-trained models and more useful AIs.


AI is also a feature of many modern lab informatics tools, with significant potential benefits for users. Many providers have incorporated large language model (LLM) transformer technology into their software, said Thurston. LLMs that generate desired outputs in response to prompts can produce compliant reports without users having to manually fill out templates. “It should be able to pull data out for the scientists to reuse or to present in a far quicker [manner] than currently available using a traditional extraction method,” said Thurston.


Drug discovery and development labs are already making use of AI’s ability to rapidly generate candidate structures and molecules. Thurston says that, just as images generated using AI tools should be checked for unusual writing or inconsistent finger numbers, assays that incorporate AI should also involve a human in the loop to validate outputs and ensure compliance. “You’re using the real-world experiment to validate what the AI model is suggesting for you,” said Thurston. “That's kind of the check and balance.”

AI regulation and compliance

The opportunities afforded by AI to improve systems and enhance output have excited many scientists. They have also caught the attention of regulators. The FDA published draft guidance on Artificial Intelligence-Enabled Device Software Functions earlier this year. This regulation stipulates tight limits on how AI-generated data is managed in medical device settings. The European Research Area Forum guidelines on the responsible use of generative AI in research also emphasize the importance of transparency and care when utilizing AI systems in research.


Looking to the future, Thurston said that AI’s effect on data generation may be a double-edged sword. “One of the concerns is going to be that there will be an explosion in the volume of data,” he said. Having strict compliance practices in place for the recording and reuse of all this new data will be paramount. Traditional auditing processes, where an auditor tracks every step of a process, may break down under the pressure of so much data. Designing lab procedures to make compliance a breeze will also ultimately make life easier for users.