NIH funding opportunities – Cancer-Related Behavioral Research through Integrating Existing Data (R01, R21)

May 17, 2016 by School of Medicine Webmaster

The description below was taken from the R01 version of this FOA.

This FOA invites applications that seek to integrate two or more independent data sets to answer novel cancer control and prevention questions.  The goal is to encourage applications that incorporate Integrative Data Analysis (IDA) methods to study behavioral risk factors for cancer, including tobacco use, sedentary behavior, poor weight management, and lack of medical adherence to screening and vaccine uptake. It is encouraged that the data being merged are from different sources and types (including both quantitative and qualitative; data may span different levels such as genetic and environmental) and include at least one source of behavioral data.  Importantly, to be considered for funding, applicants must use existing data sources rather than collect new data.  In addition, creating harmonized measures, developing culturally sensitive measures, replicating results and cross-study comparisons is encouraged.


The most intractable cancer-related problems in public health require research efforts that are integrative in nature and that span data sources and data types. Behavioral researchers have noted challenges such as a plateau in the decline of smoking rates, an increase in the prevalence of overweight and obesity, and low uptake of vaccines such as for HPV.  At the same time, integrative methods and analytic approaches to support more efficacious, efficient, collaborative, and cost-effective behavioral research have gone underused.

Cumulative knowledge is a gateway to accelerated discovery, which is far more difficult without the facility to integrate independent data sets. Many existing data, including those found in NCI-supported cohorts, could be repurposed to answer additional questions via integrative methods and thus contribute to the empirical literature, extending the utility of those data to address cancer-related behaviors. Likewise, the rapid advancement of new health information technology now enables novel approaches to social, medical, and behavioral data collection and surveillance. In public health, these ‘Big Data’ can be leveraged to answer important research questions by integrating existing data that represent different data sources and types from various disciplines. This approach exceeds the current capabilities of traditional data management approaches.

A. Integrative Data Analysis (IDA)

IDA is an efficient and cost-effective set of strategies in which two or more independent data sets are pooled or combined into one and are then statistically analyzed as a whole. These data can include both quantitative and qualitative types. This integration of data typically takes one of two forms:

merging data by common data elements (units of information that are shared or widely used across data collection efforts), where these elements are often multi-item scales or indices but can be individual items; or linking data sets through a common factor at the record level (e.g., linking across data through demographic information) such as that seen in the Surveillance, Epidemiology, and End Results (SEER) -Medicare data set, or at multiple levels such as the environmental or policy level (e.g., linking state- or county-level information with individual-level data).

IDA approaches differ from, and offer advantages over, other methodological techniques that also strive to build cumulative knowledge bases, such as meta-analyses.  In meta-analyses, summary statistics across multiple studies are pooled together. Because IDA techniques pool original raw data, there is no loss of individual information as found within meta-analytic approaches, which allows researchers to find out not only what works, but also for whom and in which context. In addition, use of IDA affords expanded inquiry within many areas of health behavior research. IDA can be used to incorporate unstructured ‘Big Data’ that were not originally intended for the examination of theoretically relevant measures. For example, searches on Google for health-related topics could be used as an objective measure of information seeking that could supplement what is gleaned from a self-report data source such as the Health Information National Trends Survey (HINTS). Likewise, social media data (e.g., from Twitter or Facebook) could be used to assess perceptions or knowledge about HPV vaccines.

B. Changing Behavioral Science Practices

Generating Harmonized Data by Creating Comparable Measures

One important aspect of IDA involves merging data through the use of common data elements. In behavioral research, common data elements are typically multi-item scales measured at the individual person unit of analysis (e.g., measures of constructs such as depression, anxiety, and quality of life). Given that this FOA does not require the collection of new data and instead has a focus on integrating existing data, it is expected that most of the responsive applications will be integrating data that contain different common data elements. Harmonizing common data elements that measure the same construct using disparate measures can be accomplished under many situations, and this FOA will require researchers to harmonize data across common data elements. Some methods will be relatively easy (e.g., recoding a continuous variable into comparable groupings of a categorical variable for measuring income), though other instances could require the creation and application of more sophisticated methods or traditional methods applied in a novel way. These methods include moderated nonlinear factor analysis, or calibrating to an established ‘gold-standard’ through Item-Response Theory-related methods such as concurrent calibration or fixed-parameter calibration or non-Item-Response Theory approaches such as equi-percentile linking. Likewise, other harmonization efforts have resulted in the ability to ‘cross-walk’ and compare measures that were assessed using disparate measures of the same construct. This FOA would encourage data harmonization and the results could be utilized by other researchers, thus contributing back to the research community.

Assessing for Cultural Equivalence across Measures

Using the exact same common data elements across independent projects sharply increases the ability to merge and directly compare data because they contain a common data element with a shared definition and the same set of permissible values. However, simply using the same common data elements does not ensure data comparability. Even with the same common data elements, it is possible that individuals from different groups (e.g., race/ethnicity, regions of the country) will not show the same probability of obtaining a score on a common data element even though they possess the same amount of the construct being assessed, a situation known as Differential Item Functioning, a type of measurement bias.  There are many methods to control for the presence of Differential Item Functioning, including the Mantel-Haenszel statistic approach to Item Response. The successful applicant would conduct such an evaluation to test for comparability. Through this process, scores on the common data elements could be placed on the same scale, thereby developing common data elements that are culturally sensitive across different groups such as race/ethnicity, gender, or region of the country.

Specific Research Objectives

NCI investment in IDA-related research would yield efficient and productive research that reduces costs, bridges behavioral research with other disciplines, and provides the ability to test hypotheses in ways that cannot be accomplished without data integration.

A. Enhanced Longitudinal Analyses

Prospective, longitudinal studies offer many advantages when studying processes or outcomes that develop or change over time; yet these types of studies are expensive and time-consuming.  Retrospective use of any one data set is often limited in scope. However, merging several similar data sets using IDA provides an opportunity to study a broader swath of behaviors and experiences to better understand developmental processes without having to collect new data. The basic idea is to merge multiple existing datasets that have common data elements but different cohorts. These methods incorporate an enhanced longitudinal component by extending the timeframe of the study without the added time needed to collect the data.  These more efficient types of IDA studies, however, require collaboration among researchers to share data, and the data must meet certain conditions before merging is possible. These conditions include having common data elements that assess process or outcome measures across studies and respondents with at least one common age (or any common variable that assesses a time-varying component) that serves to ‘link’ studies together. For instance, previous studies using these methods have examined changes in intellectual abilities over the lifespan and development of substance use and abuse in children, adolescents, and young adults. Application in the cancer arena would be particularly useful given that cancer-related behaviors, such as smoking and obesity, are initiated and maintained over a lifetime.

B. Assessment of Small Populations

Small populations are defined as populations for which the size, dispersion, or accessibility of the population of interest makes it difficult to obtain adequate sample sizes in order to test specific research questions. Examples of small populations include racial/ethnic sub-groups (e.g., Honduran Latin Americans), those with relatively rare characteristics (e.g., transgender persons), rare cancers, low base-rate behaviors, low income and rural populations, or people living in small geographic units such as census blocks or particular zip codes. The concern is that these groups may not be studied or may be aggregated inappropriately (e.g., combining all Latin American subgroups together) when there are important or unique characteristics of these groups that result in cancer-related health disparities or differences in specific cancer-related outcomes such as incidence or mortality. These types of studies have clear utility for understanding health disparities.

Benefits can be derived from linking methods to assess small geographic units. For example, methods such as small-area estimation would also be encouraged as a model-based approach to link information from population-based surveys. It takes advantage of the strengths of different surveys, with the goal of creating more accurate and precise outcomes at smaller geographic units.

C. Multi-level Analyses

Multi-level analyses can be achieved through data linkages. This refers to data collected at many levels of abstraction, that is, biological, behavioral, and societal. An example of this type of analysis would be a study that examines the relationship between individual smoking behavior measured through cotinine levels (as a biomarker) and self-reported smoking behavior; environmental factors such as number of stores selling cigarettes; and, finally, policy-level data such as cigarette taxes and indoor smoke-free laws.  These data could be linked by a geographic unit– such as county where the individual resides –and then analyzed as a whole. This approach would incorporate the effects of multiple levels of influence to understand their effects on behavior or test for the effects of interventions on changing behavior.

Research questions of interest include, but are not limited to, the following:

  • What are the long-term effects of chemotherapy on fatigue, cognition, and other treatment-related outcomes, taking into account individual characteristics (e.g., coping, multiple morbidities), type of cancer, type of therapy, health care access and use practices; and what are the characteristics of the different clinics in which chemotherapy is performed, and how do these contribute?
  • How do individual risk perceptions, knowledge, and attitudes towards tobacco use interact with biological factors (e.g., ability to metabolize cotinine), environmental factors (i.e., built environment), and policy factors (e.g., laws banning smoking in restaurants and bars, cigarette taxes) to explain why current smokers continue to smoke or have trouble quitting?
  • How do personal attitudes towards vaccination and sexual behavior (as measured within parents, adolescents, and young adults) together with physician recommendations and accessibility to health care (as measured within the built environment) interact to influence HPV vaccination uptake?
  • What are the long-term trends in cancer incidence/mortality inequities? Have they changed over time, and what are biological, self-report, environmental, and policy factors that explain these differences? What can be learned to inform behavioral interventions based on these data?
  • What are the most valid and precise estimates of cancer-related predictors, mediators/moderators, and outcomes for small populations that can be obtained by merging across population-level surveys? For example, do Mexican Americans exercise more or less than Cuban Americans, and are there different between-group predictors?

Deadlines:  February 15, 2017; June 15, 2017; February 15, 2018; June 15 2018; February 15, 2019; June 14, 2019


Filed Under: Funding Opportunities