NLM wishes to accelerate the availability of and access to secure, complete data sets and computational models that can serve as the basis of transformative biomedical discoveries by improving the speed and scope of the curation processes. This Funding Opportunity Announcement is focused on automating curation of biomedical digital assets in support of Goal 1. Objective 1.1 of the NLM Strategic Plan 2017-2027: “An important research direction will develop strategies for curation at scale…”. The ability to re-use, integrate or add to existing data sets will open new avenues of opportunity and can speed discoveries that will improve health. But this promise will go unrealized without advances in automated and autonomous curation. Objective 1.2: “Automatic, autonomous curation strategies will allow for operational efficiency as well as accelerate the speed of discovery.”
Digital curation involves characterizing, annotating, managing, and preserving digital assets such as research data sets, computational and other types of models, reusable visualization tools, and other digital assets. Proficient curation of digital assets maximizes their reuse potential, mitigates risk of obsolescence, reduces the likelihood that their long-term value will diminish or be lost, and helps assure reproducibility of research. The evolving digital ecosystem supports data-driven biomedical discovery by providing access to large quantities of biomedical and health-related data, to computational models and to open source software and code. The scope, scale and heterogeneity of digital data alone are vast, ranging from genome sequences to biomedical images, from observational health findings to environmental measurements, from family histories to sensor readings from personal trackers. As the amount and complexity of digital assets continue to grow, manual curation will not scale to meet future needs. At the same time, as researchers make research data sets, models and other tools available for new uses or re-analysis, it is important to minimize duplication and simplify the process of finding, managing, visualizing and mining all types of digital assets. To help researchers who want to find, interoperate and use these data sources to make new discoveries, and to share their findings so others can build upon them, the purpose of this funding announcement is to encourage applications for new approaches that (1) increase the speed and assure quality and security of storage techniques, retrieval strategies, annotation methods, data standards, visualization tools and other advanced data management approaches and (2) improve our ability to make biomedical data and other digital research assets findable, accessible, interoperable and reusable (FAIR).
Research Objectives
Today’s curation processes are a mix of computational and manual activities, involving specialized technical and professional staff. Public and private data repositories exist, and open source tools are widely available via GitHub and similar sites. Successful examples of semi-automated digital curation exist, employing processes that require varying degrees of human intervention to annotate the digital asset. However, to achieve FAIR access to huge collections of biomedical digital data assets, new or improved approaches for automated curation and management of digital assets must be designed, tested, validated, and widely adopted. An important component of this approach involves beginning the process of curation early in the data life cycle, ideally before the collection of new research data begins.
Generation and curation of biomedical data sets often involve a mix of people at different steps, including scientists at all career levels, technical support staff, IT specialists, annotators, and librarians. Approaches are needed that can be used by this decentralized and heterogeneous workforce to help them monitor and assess the accuracy, completeness, quality and efficiency of the automated curation of local digital assets, and of assets housed at other locations. Proposed approaches should identify the type of asset, intended user and expected improvements to be achieved over present curation and management of the asset.
Applications may propose development of new computational methods, or extend existing open-source tools and pipelines in order to enhance automation, improve efficiency, quality and security, and control costs. The improvements over comparable existing management or curation approaches must be documented. During the project period, applicants are expected to test the approach with one or more groups of targeted users. Approaches should be applicable to more than one subject domain. All awardees are expected to disseminate widely the results of their research including software.
Potential topics to be addressed include but are not limited to:
- Fully automated curation that meets community-defined standards for metadata
- Automated approaches to integration of disparate or heterogeneous data sets that map components to common data elements
- Automated annotation via extraction from text or other digital sources, linking the extracted information to a data set or other digital asset
- Automated quality control approaches that increase the completeness, accuracy or quality of a data set or model
- Automated research pipeline that begins at data capture or model development and is self-documenting
Deadlines: August 15, 2018; January 31, 2019; July 31, 2019; January 31, 2020 (see announcement for letter of intent deadlines)
URL: https://grants.nih.gov/grants/guide/pa-files/PAR-18-796.html
Filed Under: Funding Opportunities