CICI: Data Provenance: Collaborative Research: CY-DIR Cyber-Provenance Infrastructure for Sensor-Based Data-Intensive Research

Source of Support: NSFACI Div Of Advanced Cyberinfrastructure

Award Amount: $219,571

Period Covered: 01/01/2016 to 12/31/2018

Grant: 1547324

Today scientists in many disciplines, including biology, medicine, agronomy, energy management, hydrology, and earth sciences, rely on the use of massive datasets collected from various sources. Such collections are made possible by advances in computer technology such as sensors which collect data such as humidity, air quality and so forth, and powerful computer systems that analyze the data. The increased use of data for scientific research poses some important challenges. Data can have errors which impact conclusions derived from the data. Scientific research has to be reproducible to support validation and detection of scientific misconduct. Addressing these challenges requires tracking data used in research projects. Examples include: tracking the source that originated the data — for example a mobile phone that acquired some images- and its geographic location; tracking which computer systems processed the data; tracking how scientists modified given data. Such a set of information is referred to as provenance — very much like the provenance of artistic artifacts. Managing provenance is technically complex; yet it is key for data-intensive research. This project makes important advances in this direction by developing software systems for securely managing provenance.

This project develops a provenance management system for cyberinfrastructure that includes different types of hosts, devices, and data management systems. The proof-of-concept system, referred to as Cyber-provenance Infrastructure for Sensor-based Data-Intensive Research (CY-DIR), will support scientists throughout the life-cycle of their sensor-based data collection processes, including the continuous monitoring of sensors to ensure that provenance is collected and recorded, and the traceable use and processing of the data across different data management systems. CY-DIR provides researchers with provenance and metadata about data being collected by sensors. Provenance security will be assured by the use of efficient encryption techniques for use in sensors, secure logging techniques, and secure processors. Research from this project will provide novel results in several areas: provenance techniques for sensor data; cryptographic key management for sensors, mobile devices, and unmanned aircraft systems; provenance aware streaming data processing techniques; protection of provenance data against tempering; provenance data integration across different data management systems.