Artificial Intelligence is enabling businesses all over the world to create new value from their existing data. Data mining is the most common application growing out of this trend. By definition data mining is the practice of examining large databases to generate new information. The tasks accomplished by data mining can be described in just a few lines, however the technology behind this process is complex involving neural network.

For the regulatory areas in the life sciences industry, data mining is especially interesting because of the substantial amount of information which need to be controlled for quality and compliance purposes. This information is mostly in the form of documents and stored in many different formats and silos across the organization.

At Phlexglobal, we have found that tagging documents with appropriate attributes (or metadata) mined from the documents is one of the most compelling use cases.

Typically, businesses have a Document Management System where in they have been managing documents for decades. Over the years, the document management systems have changed, the number of documents has grown exponentially and upon examination, these companies often observe several of the following problems:

  • Legacy documents migrated without accurate or complete metadata
  • Documents assigned with incorrect metadata
  • Changes in the vocabularies and terminologies leading to ‘mixed’ metadata
  • Changes in the document organization leading to documents with incorrect metadata
  • Documents completely missing metadata
  • Changes in the organization functions, leading to change in document categorization

Addressing this challenge can feel overwhelming and insurmountable. Yet there is hope! With our modern data mining solution, Phlexglobal is able to tackle this challenge by helping businesses to:

  • Defining the desired metadata attributes: The attributes for which the data should be mined from the document
  • Defining the sections for attributes data: The section or subsections from where these attributes data can be mined
  • Defining the vocabulary: The vocabulary relating to the attributes
  • Managing Data: Once the documents have been mined, provide methods to manage, correct and confirm the results.
  • Update Document with Metadata
  • Export of Metadata (CSV, Excel, XML, SQL Dump, and more)
  • Integration with Target system for updates


Looking at the powerful capability of data mining it becomes clear that data mining can no longer be regarded as a  niche solution anymore and utilize its capabilities in standard processes. It enriches every data driven process in quality and speed and should be part of any company software solutions. It is difficult to assume the savings achieved, since every business uses their data in a  different way and starts with different qualities, but nevertheless every user will feel the benefit of a cleared and well connected information. If you want to get the most out of your Document Management System there is no way getting around data mining and curation.