How AI Can Support the Trial Master File

Posted by Aaron Grant | Jun 4, 2024 11:00:40 AM

Full adoption of artificial intelligence might be some way off, but industry is starting to embrace the opportunities AI presents to streamline the Trial Master File (TMF) process. 

One of AI’s biggest attractions is the speed at which it can conduct a complete review of the TMF. While you still need expert oversight for quality control (QC), AI enables you to carry out a thorough assessment to help teams reduce time spent on time-intensive manual processes. In companies with small TMF teams, with limited capacity to continually assess documents, this can be a game changer. 

No matter the size of your company, however, the keys to implementing AI successfully lie in the quality and robustness of your dataset and the TMF experts directing AI on how to build on that dataset to ensure a model that delivers solutions.  

With AI being such an expansive term, it’s important to note that the examples below draw on more mature AI tools such as machine learning and natural language processing (NLP), not the newer large language model (LLM) approach to AI. While LLMs are promising and dominating the news due to their ability to mimic human language (as well as for their propensity to make things up or “hallucinate”), we believe they will require more robust vetting and controls before being used in clinical development. 


The impact of AI 

A high-quality TMF is essential to avoid critical inspection findings.i  As this involves labor-intensive work and few companies have the resources to maintain 100% QC manually, most take a risk-based approach to TMF documents in line with regulatory guidelines.ii 

The CDISC TMF Reference Model established a common standard for classifying and organizing documents.iii But with more than 400 sub-artifacts on average within the TMF, it can be difficult for owners to determine where to put the document. AI can alleviate the burden, offering prompts on classification. 

Our operations team uses QC automation, enabling AI to check all documents without increasing a client’s reliance on human resources. We analyze the accuracy of the AI classification with the data to determine how likely AI will be correct moving forward and split that by document types. When AI does its QC, the classification goes through the machine learning algorithms that have been pre-trained on millions of TMF documents and refined by TMF experts - producing a set of classification suggestions with associated confidence scores that a human reviewer can examine.  

The human reviewer thus is saved the rote work of bringing up and examining each document to classify it for the TMF, and can instead focus on the subset of documents that have lower confidence scores. 

If the AI has been historically accurate classifying this type of document, it will consistently have a high confidence score and the reviewer only needs to accept the suggestion to have the document automatically entered into the Trial Master File in the right place and with the correct metadata. In some cases companies may choose to have these selections automatically skip the review stage and go directly into the TMF - with a later risk-based QC process having a human reviewer look at a sampling of these documents to help ensure that the process was correct. 

AI can also be used to scan documents during quality checks to look for obvious errors, such as blank or missing pages, legibility issues or inconsistent rotation of the document pages. Another invaluable way AI can be implemented is in the quick and easy extraction of metadata, such as the name of an investigator on a clinical trial, from documents. 

When used correctly, these AI tools support the overarching objective - a high-quality TMF without the additional resources that otherwise would be needed. 


Expert guidance 

As mentioned earlier, companies need two factors to be in play before they can reap the benefits of AI. The dataset needs to be robust enough to build an AI model and TMF experts must be directing that build. 

For example, to answer questions like “does this document have a standard template across the world or is it going to be different from country to country?” you need to understand the variances with the data – where it looks the same and where it differs. AI developers need to be directed how to weight the model to determine whether it is an outlier or a trend.  

It’s a very different way of developing software, which ordinarily would involve an input “A” giving you an output “B”. A neural network AI model needs expert inputs so it can pick up trends versus outliers and come up with an answer. You are constantly tweaking the model to get the right output. 

Let’s consider a situation where you have a small subset of data, say 100,000 documents, in the TMF that you want to use as training data. In this instance, you might be able to get some good results if all your trials are similar. But what if you want to branch out into a new geography and you’ve only conducted US trials? If you have only trained AI to look at those documents, there will be gaps in the model. 

Our approach has been to pre-train machine learning algorithms on millions of document types against the CDISC TMF Reference Model, which “provides standardized taxonomy and metadata and outlines a reference definition of TMF content using standard nomenclature.”iii The algorithms can thus make highly “educated” suggestions for most document types as to which zone, artifact, and subartifact of the Reference Model the document belongs in, as well as the proper metadata associated with that document. 


Future considerations 

There are no specific guidelines around the use of AI within the TMF. However, the US Food and Drug Administration (FDA) and the European Medicines Agency (EMA) are discussing the role of AI and machine learning (ML) in the drug development process.  

In a discussion paper, the FDA said: “Use of AI/ML could ... significantly enhance data integration efforts by using supervised and unsupervised learning to help integrate data submitted in various formats and perform data quality assessments. Additionally, AI/ML can be used for data curation via masking and de-identification of personal identifiable information, metadata creation, and search and retrieval of stored data. These applications can potentially increase data accuracy and improve the speed at which data are prepared for analyses.”iv 

While there is naturally some reticence about the implementation of AI, there is clear evidence of its capacity to reduce the TMF’s resource-heavy processes and increase both the speed and accuracy of QC checks at the same time.  

With thousands of documents coming from different sources, it is impossible for a small TMF team to conduct QC on all essential documents at a high enough frequency. What we often see when conducting quality reviews for clients is gaps in documentation – where three protocol amendments might have been done, for example, the first might be missing. This is a labor-intensive process ripe for AI solution. 

With the right inputs, AI will eventually be able to identify such gaps in real time and provide warnings, resulting in timelier and more accurate TMFs in the future. 


About the author: Aaron Grant is VP of Solutions Consulting at Cencora PharmaLex (formerly Phlexglobal), where he is focused on helping clients to solve challenges through a mix of people, process and technology.  


[i] Recent Findings in GCP Inspections, ECA Academy, May 2022.

[ii] Guideline on the content, management and archiving of the clinical trial master file, EMA, Dec 2018.

[iii] Trial Master File Reference Model, CDISC.

[iv] Using Artificial Intelligence and Machine Learning in the Drug Development of Drug & Biological Products, Discussion Paper and Request for Feedback, FDA.


This blog is intended to communicate PharmaLex's capabilities which are backed by the author’s expertise. However, PharmaLex GmbH and its parent, Cencora, Inc., strongly encourage readers to review the references provided with this blog and all available information related to the topics mentioned herein and to rely on their own experience and expertise in making decisions related thereto as the blog may contain certain marketing statements and does not constitute legal advice.

Topics: TMF

Staying on Top of the Trial Master File

As regulatory agencies increasingly focus on the processes and workflows behind the Trial Master File (TMF), not just ...

Read More

How AI Can Support the Trial Master File

Full adoption of artificial intelligence might be some way off, but industry is starting to embrace the opportunities ...

Read More

A Blueprint for Genuine Partnership in TMF Improvement

The Trial Master File (TMF) has evolved from a repository of documents at the end of a study to an integral cog in the ...

Read More

Rewind the clock and take TMF back to basics

Rewind the clock and take TMF back to basics: The paradigm of maintaining simplicity while embracing technology. In 20 ...

Read More

How to Reduce Risk and Effort When Migrating a Trial Master File

Migrating Trial Master File (TMF) data is a fairly common occurrence, usually driven by one or more of the following ...

Read More

Consistency: The Secret to Improving Quality and Efficiency in TMF Document Processing

Based on extensive work helping trial sponsors and Contract Research Organizations (CROs) implement Trial Master File ...

Read More

Subscribe To Our Blog!

Digital Brain Header Large Brain Right

It's time to raise your standard 

Contact Us