Kipi.ai / Insights / Blogs / Clinical Trial Recruitment and AI

Clinical Trial Recruitment and AI

This post is part of the solutions supported by Kipi’s Clinical LLM.

Authored by – JP Nellore

Clinical Trials and Patient Eligibility

Developing a new drug is an enormously expensive undertaking. Estimates suggest that the cost to bring a single drug to market surpasses a staggering $2 billion. Clinical trials are an essential step in this process where the developer of the drug studies how well the new medicine works. Based on the objectives of the study and information collected during preclinical studies, the research team develops a study protocol that determines the following aspects of the study:

  • Who qualifies to participate in the study (eligibility criteria)
  • How many people will be part of the study
  • How long the study will last
  • How to limit research bias (use of control groups)
  • How the drug will be given to patients and at what dosage
  • What assessments will be conducted, when, and what data will be collected
  • How the data will be reviewed and analyzed
Unlock the power of Data

Eligibility criteria describe characteristics that must be shared by all the study participants, typically consisting of inclusion and exclusion parameters; an example of the participation criteria for a trial is linked here. The success of the clinical trial depends on the ability of the research team to enroll a prespecified number of study participants (to reach the study’s statistical power to ensure significant findings) within a planned time frame. As it turns out, recruiting patients into clinical trials is the most time-consuming and costly part of the drug development process. 

Querying the clinicaltrails.gov database, a registry maintained by the United States National Library of Medicine to provide information on the publicly and privately supported clinical trials, reveals rather sobering stats—about one in five trials fail to enroll enough participants and nearly all trials take longer than planned for the patient recruitment.

Patient Screening is a Laborious Process

The eligibility screening and patient recruitment details can vary based on the objectives of the clinical trials; however, at a higher level, there are a common set of activities. Professional nurses, clinicians or trained workers manually review a patient’s medical history, clinical conditions, and demographics against the trial’s eligibility criteria. When performed manually, this is a laborious process. The patient information may be scattered across multiple systems, databases, and documents. Moreover, the data is stored in structured (e.g., demographics, medication lists) and unstructured (e.g., clinical notes, radiological images, graphs) formats. The size of the clinical notes, which often carry critical information related to the eligibility criteria, can vary from a single sentence to multiple pages per patient visit. Often, when a patient is referred to the trial by other physicians, the patient information must be obtained from the other health system and combined with the internal data. A screening typically takes nearly an hour for each patient. Studies show that the cost of eligibility screening for cancer-related clinical trials in 2012 was between $130 and $340 per enrolled patient.

Once a potentially eligible patient is identified and informed, the patient needs to consent before they can participate in the study. In some instances, this screening process is further complicated by the need to repeatedly re-evaluate the patients’ eligibility during each visit throughout the course of the study.

Promise of Pre-trained Large Language Models

The proliferation of electronic health records (EHRs), incentivized by the Affordable Care Act (ACA), and data warehouses has led to collecting larger quantities of patient data. Health Information Exchanges (HIEs) also allow for the interoperability of patient data across provider organizations. A key advantage of the data warehouse is that analytical workloads can be run on millions of aggregated patient records without impacting the clinical systems used for patient care. These developments have spurred the interests from developers of Natural Language Processing (NLP) technologies to mitigate the challenges in patient screening and improve the efficiency of trial-patient matching.

NLP, in combination with Information Extraction (IE), are applied on the unstructured patient clinical data to extract medically relevant terms, normalize them against standard terminologies, and adjust for any negations. This information is stored in the form of a patient vector. Similar approach is used on the eligibility criteria to create the clinical trial vector. A function then matches the trial and the patient vectors and computes the matching score for each trial-patient pair. A common approach is to first apply the rules-based logical filters on structured patient data, like demographics, as a ‘pre-filter’ step, before NLP techniques are applied on the unstructured data of the selected cohort. The gold-standard used as a reference for these studies are the outcomes of the screening performed by a board-certified clinical domain expert. While these efforts have led to reduction in the patient-trial matching workload, methods relying on information extraction techniques often failed to interpret semantic relations correctly (e.g., inability to recognize ‘…treated in 2021’ for the criterion indicating ‘…in the past five years’), leading to false positive recommendations.

The advent of pre-trained Large Language Models (LLMs) like OpenAI’s ChatGPT, Snowflake’s Arctic, Mistral,  Meta’s LLaMA and Google’s BERT, have led to a paradigm shift in these NLP tasks. These large, deep learning models are pre-trained on vast datasets for a generic task like generating the next word given the previous words; they implicitly carry the latent feature representations of language and evidently can interpret semantic relations correctly. When fine-tuned for tasks such as trial-patient matching, these models have demonstrated performance consistent with the state-of-the-art. In some instances of recruiting patients for cancer trials, these pre-screening models can reduce the cost-per-enrolled patient down to a mere two dollars.

So while recruiting enough participants remains a major bottleneck for clinical research, tools like PLMs represent a powerful way to ease that bottleneck using cutting-edge AI. We will take a look at the implementation of some of these in our future blog posts.

July 02, 2024