AI Advances in Life Sciences Regulatory: 2020 Review and 2021 Expectations

Posted by Jim Nichols | Dec 15, 2020 2:32:42 PM

Artificial Intelligence (AI) continues to make strides in the life sciences regulatory arena, but many obstacles remain. This blog takes a high-level look at what we see in store for regulatory AI in our industry - be sure to stay tuned for regular updates on key developments as the year progresses.  


Since there are many misconceptions around artificial intelligence, we should first answer the question, “what is AI?”   


Artificial intelligence, together with terms such as data mining, machine learning, neural networks, big data and more have become ubiquitous. In actuality, AI is an umbrella term that consists of many different technologies, techniques and approaches – including but not limited to: 


Machine Learning (ML)  

With machine learning, computers write their own programs so we don’t have toIn traditional software programming, each algorithm has an input and an outputdata goes into the computerthe algorithm executesand out comes the result. Machine learning turns this around: in goes the data and the desired result, and produces an algorithm that turns one into the other – and which can improve with more data and training. 


Natural Language Processing (NLP) 

This is the ability for machines to do something we take for granted, yet presents one of most difficult challenges for a machine: understanding and interpreting human language. As computing capabilities have improved, so too has our ability to design NLP systems that come closer to matching human levels of interpretation.  


Note that while natural language processing is often compared with text analytics, this represents just one small aspect of NLP. Text analytics simply counts, groups, and categorizes words to extract structure and meaning from large volumes of textual content. 


Natural language processing, on the other hand, goes well beyond basic text categorization to gain a deeper understanding of the meaning (semantics) and structure of given text data – its context. NLP and text analytics can be used together to extract meaningful information from structured and unstructured content, and really understand the language the same way as humans do.  


Big Data 

To train AI systems using the above (and other) methods requires large volumes of relevant data as input – the “fuel” for the machine learning engine to improve its accuracy. For example, training machine learning algorithms utilizing natural language processing to classify unstructured content requires lots and lots of data – thus, “Big Data.” 


Robotic Process Automation (RPA)  

The technology consulting firm Gartner defines RPA as “a productivity tool that allows a user to configure one or more scripts (which some vendors refer to as “bots”) to activate specific keystrokes in an automated fashion. The result is that the bots can be used to mimic or emulate selected tasks (transaction steps) within an overall business or IT process. These may include manipulating data, passing data to and from different applications, triggering responses, or executing transactions.” 


The simplest way to think of RPA is as the workers (the bots) programmed to complete the repetitive tasks identified by humans, using the directions provided by some set of input data – possibly generated from artificial intelligence tools.  


Where Are We Today? 

It has become clear that artificial intelligence has tremendous potential help solve difficult regulatory business problems in life sciencesA key to this has been to clearly define the problems that are the most time- and resource-intensive for regulatory affairs and regulatory operations. Following that, we need to conduct some intelligent matching: which of these problems are suitable for AI, and which are not, give the current state of the technology? 

In fact, not all problems are well-suited to AI technology. Many can be resolved with improved process and governance to organize the data and documents better, or by other means such as business workflows and analytics.  

An AI working group that is part of the DIA RIM Reference Model working group (see below for more infohas developed a list of the most pressing challenges that could be addressed by AI technologies: 

  • Searching for documents and data – for example, finding the right document for a specific node in a submission outline, which can be very time-consuming  
  • Classifying and organizing documents in an electronic document management system (EDMS), since properly classifying documents takes a lot of time and effort and is prone to error 
  • Compiling and mining regulatory intelligence and organizational wisdom more quickly and with less effort, for example around submissions 
  • Tracking health authority correspondence and commitments, tracking that that those commitments are completed effectively and within the required timelines, and ensuring that responses to agency queries are consistent across your products and markets 
  • Streamlining the impact assessment of product changes across all markets 
  • Checking label compliance with the Company Core Data Sheet (CCDS) 
  • Extracting data from documents to comply with data standards and regulations such as IDMP and UDI  
  • Ensuring content that is reused across submissions stays in sync as it gets modified 


We’ve also made strides in collaborating to share our knowledge of how our industry and other industries are investigating and applying these kinds of technologies. What are leading sponsors, vendors, and service providers doing? How are the health authorities approaching AI tools and issues such as validation? 


If you are looking to engage further with your colleagues in this area, below are some of the main industry groups working to foster collaboration and knowledge-sharing around AI and advanced technologies in life science regulatory:  

  • DIA Regulatory Information Management (RIM) Reference Model AI Topic Team  this group will also be presenting its findings and progress at the DIA Regulatory Submissions, Information, and Document Management (RSIDM) Forum on February 10, 2021 
  • Pharma RPA Community ( 
  • International Society for Pharmaceutical Engineering (ISPE) – the organization’s Good Automated Manufacturing Practice (GAMP) provides a standardized methodology to address evolving FDA and other regulatory agency expectations for computerized system compliance and validation ( 


In addition, don’t hesitate to contact me if you have any questions about these groups. 


Several best practices for implementing AI in life sciences regulatory have also emerged, which will be familiar to anyone who has attempted to implement a transformative technology in our industry: 

  • Start with a small, provable business case
    • Select use cases where the benefits will outweigh the costs of implementing the new technology 
    • Avoid automating bad process  regulatory affairs and operations should be given an understanding of AI capabilities to explore “what if” scenarios for making fundamental changes to existing business processes 
  • Define success criteria at the beginning 
  • Get buy-in up front from stakeholders regarding time and resource commitments  
  • Run agile proofs of concept to prove that the technology will work and quickly show value 
  • Engage with stakeholders throughout the process  
  • Match the right technology or set of technologies to the business problem 


Emerging Uses Cases for AI In Regulatory 

Several key use cases for AI in life sciences regulatory have also come to the forefront over the past 12-18 months. One of the top identified cases is to utilize documents as sources of data, so that they can be used for other purposes. Examples of this “augmented use” include building regulatory intelligence or health authority correspondence databases that can be mined to make queries or understand trendstranslating data from documents into document metadata for more efficient organization; and to use the data to comply with data-oriented standards.  


A practical example of this approach occurred when one of our customers was preparing for IDMP. They had completed their data source analysis and were getting ready for data extraction. During the analysis they had discovered that, like most companies, a large portion of the required information was not in structured databases. Rather, it was locked into regulatory documents and needed to be extracted in order to comply with IDMP.  


The bulk of the unstructured data was held within SmPC documents and module three documents from eCTD dossiers. What’s more, some of the data such as indications would need to be codified and matched against dictionaries such as substances. Doing this manually requires special knowledge and training, usually by physicians who are performing both the coding and the QC of the output.  


By applying sophisticated machine learning capabilities – already trained on large sets of life sciences regulatory data – to this complex challenge, we were able to curate the 50 unstructured data fields with high accuracy, while significantly reducing manual effort. Since we have also integrated our data-mining platform into external data dictionaries to enrich the data - such as automatic coding of indications into the MedDRA dictionary – we were able to automate the critical coding process as well. 


When an indication was extracted by the system, it automatically matched a term against the MedDRA dictionary extracting all term levels. In addition, the SPOR OMS (Organizations Management Service) database was integrated to automatically match the organization data; substances were matched against G-SRS; and dosage forms were matched against XEVMPD terminologies. In the end, the company estimated it reduced the project effort from 12 months to just 85 days. 


Other emerging use cases for AI technologies leverage sample documents to find other documents, useful when documents aren't tagged properly. Sample documents can be also used to automatically author starter documents using other parameters - for example, natural language processing has been used to generate patient safety reports using a sample report, and then produce information about other patients and their safety issues.  


Companies are also looking at AI to check compliance with labeling submission standards; combining analytics with AI to see trends in data for improved regulatory performance to make process and policy changes for greater speed to market. What’s more, they are using AI technologies to simplify product change management by changing the authoring process, providing greater context around the documents. The result? Companies can more easily assess the impact of product changes and make the updates needed to support the changes.  


In addition to the IDMP example noted previouslyPhlexglobal has collaborated with customers to apply AI and robotic process automation to the following business problems: 


Streamlining Health Authority communications – We trained AI models on various types of Health Authority Communications to classify and extract information such as related product, submission, agency and many more, and connected to multiple internal repositories for data curation and validationOur customer can now leverage an easy-to-use mechanism that can process incoming correspondence in a matter of a few seconds – instead of the 5-10 minutes it typically takes – and take advantage of a simple verification screen before passing the now-categorized content into a central repository. 


Risk-based quality control (QC) of a Trial Master File (TMF) through AI-assisted indexing  Phlexglobal developed this AI-powered automation tool to extract metadata from a document and present it for review, along with a confidence score for each to help reviewers prioritize their efforts. Clicking on a suggestion takes the reviewer directly to the location within the document, where they can verify the information and either accept the suggestion or raise a query. The results have been impressive, reducing document processing time by up to 50%, speeding human QC efforts significantly, identifying missing essential documents much earlier, and improving timeliness of TMF documentation. 


Increasing regulatory submission efficiency and quality - With regulatory submissions becoming more complex and data-driven, it is logical to use this data to automate many of the processes used daily. For another customer, Phlexglobal applied our regulatory AI pre-trained on life sciences data to this challenge, auto-classifying and auto-compiling eCTD documents - streamlining document attributes and speeding submission assembly.  


What’s In Store for 2021? 

Over the next 12-24 months, we should expect AI and other advanced technologies such as blockchain to further enhance performance of regulatory affairs and operations. Overall we expect to see simplification and integration of business processes across regulatory, R&D, manufacturing, quality, and commercial being essential to improve business processes and compliance.  


Advanced technology will support and assist what we already do today and will be weaved into our processes. Data gathered for regulatory intelligence and knowledge management will be analyzed as companies develop their regulatory and submission strategies, and leveraged at multiple points throughout the regulatory lifecycle. 


And we will continue to expand our use of natural language processing, machine learning, and robotic process automation to streamline creation of content and metadata tagging while mining the content to build the database to help comply with regulatory standards such as IDMP and eCTD v4. Additionally, process attributes would be tagged in the knowledge base to inform future submission planning efforts.  


For publishing, this structured and minable content would be linked into submission outlines, ideally early in the process. So we can then effectively conduct continuous publishing, minimizing bottlenecks. And once again, tracking how long all of this takes throughout the process. As agencies are reviewing and sending inquiries on submissions, we will gather the data from the health authority correspondence to inform the knowledge base, and have a process to periodically update regulatory procedures and templates to reduce repeat queries. 


Upon product approval, RPA could forward that approval letter to appropriate parties so that they would streamline the product release and launch processes. At the same time as an organization is corresponding with health authorities and managing commitments, we’ll be able to extract those commitments and complaints for automatic entry into the RIM system and the knowledge base 


2021 Predictions for AI in Life Sciences Regulatory 

In closing, I’d like to make a few predictions for 2021We’ll be tracking these trends and challenges throughout the year, as well as industry progress on the above use cases, and provide regular updates: 


Top Trends  

  • Greater cross-functional communication and collaboration between clinical and regulatory teams 
  • Rapidly expanding datasets & improved algorithms (the expanding virtuous circle) 
  • Growing number of use cases with measurable ROI 
  • Increasing trust in AI among users (but still far from “hands off”) 
  • Standards-setting bodies and Health Authorities begin to catch up with industry (and in some cases even lead) 


Top Challenges  

  • Technology: Validation of AI systems and processes 
  • Process: Transitioning from proof-of-concept (POC) to truly scalable AI (organizational willingness and cost) 
  • People: Constraints / allocation (having the right skillsets, training, and long-term investment in people) 

Do you agree or disagree with these predictions? Have your own you would like to share? We’d love to hear from youContact me with your questions or comments.  


Regulatory Robotic Process Automation: The Evolution from Compliance Enablement to Compliance Automation

Automation Brief

Discover best practices on leveraging artificial intelligence to make regulatory compliance faster and more accurate, with less effort and cost.

You will also get valuable lessons learned from your peers, with real-world AI use cases that include:

  • IDMP Data Mining at a Top 20 Pharma
  • Streamlining Health Authority Communications at a Top 10 Pharma
  • Increasing Regulatory Submission Efficiency and Quality for a Top 5 Pharma
Get the Brief

Topics: eCTD, Automation, TMF

How to Reduce Risk and Effort When Migrating a Trial Master File

Migrating Trial Master File (TMF) data is a fairly common occurrence, usually driven by one or more of the following ...

Read More

Consistency: The Secret to Improving Quality and Efficiency in TMF Document Processing

Based on extensive work helping trial sponsors and Contract Research Organizations (CROs) implement Trial Master File ...

Read More

Solved: Is Our TMF Missing More than We Know?

In a poll of nearly 100 TMF professionals during a recent Phlexglobal webinar, “Lowering your TMF Risk Temperature: ...

Read More

How Risk-Based TMF Quality Checks and Quality Review Improve Inspection-Readiness

For the kickoff of Phlexglobal’s “TMF Summer Shorts” program July 18, 2023, we purposely chose a hot topic: how to ...

Read More

Building a Strong Sponsor-CRO Relationship for an Inspection-ready TMF

Collaborative sponsor-contract research organization (CRO) relationships can be the key to ensuring outsourced clinical ...

Read More

Entering the Post-Paper Realism: The Principles of Cyber Ethics

The Shoulds and Shouldn’ts of Electronic Systems and Data in Clinical Trials Intro

Read More

Subscribe To Our Blog!

Digital Brain Header Large Brain Right

It's time to raise your standard 

Contact Us