CC*IIE IAM: Data at the Speed of Trust: Matching Computational Resources to Researchers Using Sensitive Data

  • Awarding Agency: National Science Foundation
  • Amount: $299,997
  • Grant Dates: 9/1/14 – 8/31/16
  • PI: Tracy Futhey

Increasingly, scientists use data that are considered sensitive as they conduct research, and protecting these sensitive data significantly adds to the complexity of doing research. This project reduces that complexity. Sensitive data requires strict security and confidentiality, complicating scientific collaboration vital to conducting research. Secure computing environments that protect data also tend to be inflexible, burdensome to use, and difficult to expand, reducing research efficiency. To safeguard sensitive data while supporting scientific collaborations, the project uses software-defined networking (SDN) technologies that allow researchers to establish customized network routes with special provisions for security and performance, creating (for example) a high-speed, secure, and exclusive data connection between distant computers and storage devices. Attribute-based access control systems allow facts (as certified by a trusted entity) about a potential user to determine whether and what access should be granted to the user. This project combines emerging SDN networking technologies and the attribute-based access control systems arising from Internet2’s Scalable Privacy initiative. Together, they secure data while also simplifying the granting of data access to collaborators from different organizations federated by a trusted attribute registry. Duke’s existing protected network for sensitive research data serves as the project testbed, leveraging Duke’s past experience with SDN and open source identity and access management tools widely deployed in academic research institutions. Thus, the project expects to safely enable research collaborations across institutions that use sensitive data, while providing an easy-to-use means to network powerful, diverse, and even geographically remote computational resources necessary to analyze the sensitive data.