FDA – Deconstruction of data and standards to support harmonization and interoperability with real world data for clinical research and regulatory submissions. (U01 Clinical Trial Not Allowed)

January 21, 2022 by dld5dt@virginia.edu


Data concepts (not just the information but the underlying definition of its meaning) between many different sources of real-world data (RWD) (e.g., EHRs, Claims, and digital health technologies) as well as those used in clinical research and regulatory submissions are almost always similar but not identical.  This difference requires  mapping and transforming data between RWD and clinical research use to rely heavily on human interpretation and is an obstacle to developing a fully reliable computer-driven approach.  This request for applications is aimed at defining the elemental components of RWD in a way that may support future computable approaches to defining what is currently only implied by definitions of data elements or business practices to facilitate standardization for regulatory submissions.


In the context of clinical trials and pharmaceutical regulatory review, the concept of “real world data” (RWD) can be considered data created in the “real world” of everyday experience, such as a routine patient visit to a healthcare provider, as opposed to data created under clearly defined protocols typical of controlled clinical trials.  Examples of data from the real world might be:

A person goes to a doctor for a typical healthcare reason (not as part of a clinical trial).  Information is put into an EHR.

A corner pharmacy dispenses a medication to that person.

That person or their doctor enters information into a Patient-powered disease Registry.

The primary use of such data – in particular, data collected in Electronic Health Record (EHR) systems and/or insurance claims databases during normal healthcare visits – is in support of patients, caregivers, and healthcare providers.  In addition to the primary use, there is continued growing interest in exploring how such information could also be used for secondary purposes such as the support of public health or clinical research and regulatory submissions. There is great interest in attempting to use RWD sources, including EHR and claims data, in support of clinical research and regulatory submissions to analyze the data and generate supporting evidence for, as examples, a new indicated use or a safety-related issue for an already approved pharmaceutical drug.

However, given the current fragmented state of healthcare data interoperability, as well as the independent evolution of data standardization for research and regulatory submissions, the challenges to achieving this goal are multifaceted and highly variable, depending on the research study of interest and the RWD sources being considered.  As of now, even just for use of EHR and claims data, there is simply no “one size fits all” approach to easily migrating data from EHR sources to the ultimate destination of clinical research submissions for regulatory approval.

Even if a future state of optimized interoperability in healthcare were reached, where every EHR and claims system globally conforms to the same agreed-upon way to record, store, represent, and exchange data – a major achievement – it would do little, by itself, to bridge the gap between healthcare RWD and research/regulatory data.  Among the challenges that would still need to be addressed include significant divergence in (not an exhaustive list):

      • Semantic meaning of superficially similar concepts
      • Syntax, comprehensiveness and granularity of data for similar concepts
      • Terminology systems (sometimes multiple) for similar concepts

The challenges exist even with seemingly simple concepts such as representation of sex/gender and ethnicity/race and expand greatly when tackling medication usage, adverse events, and other concepts.

In many cases, the difference in the semantic meanings alone is the root of many problems translating a healthcare concept into a research/regulatory one.  Clues to the differences in the semantic meaning are sometimes found not in the computable data elements, but in the accepted definitions for those elements.

As an example (purely illustrative, not intended for perfect accuracy nor meant to be followed as written for a proposal):

Gender/Sex in a clinical trial may be recorded based on visual observation of relevant physical characteristics at birth.  However, in healthcare settings this may be recorded based on visual observation of relevant physical characteristics at time of visit.  A further alternative recording from genomic data may define this based on lab-based observation of X/Y chromosome composition at time of genetic testing.  While each of these recordings will be perfectly correct in the setting where it was made, any attempt to simply “map” the gender/sex from an element based one definition to another element based on a different definition may result in an incorrect value populating the destination data element. In aggregate, if biologic specifics are relevant to a particular clinical study, this divergence between what appears in a mapped set of research data and the actual reality recorded in healthcare data could result is highly misleading analyses.

A potential aide to this problem could be to more precisely define, not just in simple text but in computable metadata, exactly what makes up each version of superficially similar concepts.  This could be done, for example, by deconstructing the concepts implied in the definitions and actual use of data elements to define metadata which can clearly show the relationship between the similar concepts.  Using the above example, one could take an approach (again purely illustrative and highly simplified) where Gender/Sex might be defined as the sum of 3 granular components:

1. Observation Type: Visual, Lab-based

2. Observation Target: Physical Characteristic, Chromosomal/genetic indicator

3. Time of Observation: At birth, at time of visit/testing

There are likely many approaches to achieving this goal which may or may not look like the example above. Regardless of how it is accomplished, an approach of this general nature could provide granular clarity needed to more accurately assess the path for translating real world data concepts to the research/regulatory with as much semantic and syntactic precision to ensure the meaning of the information is as unchanged as possible.  This, in turn, would greatly enhance the ability to effectively utilize real world data sources for research and regulatory submissions.

Key Dates:

Letter of Intent Due Date(s): February 7, 2022
Application Due Date(s): March 31, 2022, by 11:59 PM Eastern Time.

URL for more information:

Filed Under: Funding Opportunities