This memo outlines acceptable use of the Duke Compute Cluster and provides useful information about the security of data stored in the cluster. People who are granted access to the Duke Compute Cluster agree to the terms of this notice.

This document is reviewed annually and distributed to users of the Duke Compute Cluster. If you have questions about this document, please contact rescomputing@duke.edu.

In the 2017 review the following topics were revised or added:

  • Clarification of the role of the “Point-of-Contact” as relates to data management and stewardship.
  • Notification of the implementation of scheduled purges of idle data left in the /work directory.
  • Encouragement of use of “check-pointing” methods to recover jobs that are interrupted.
  • Expectation of availability of GPU computing for the “common” partition use.

Research and academic uses of the Duke Compute Cluster

The Duke Compute Cluster serves the research, education, and service missions of Duke University, and users of the cluster agree to only run jobs that relate to these missions. For example,

  • Bitcoin or other electronic and cryptographic currency “mining” for purposes of financial gain is not appropriate. Research and instructional uses of “mining” tools, not for purposes of financial gain, are not restricted.
  • Commercial and business use of the cluster is not appropriate.
  • Unauthorized use or storage of copyright-protected or proprietary resources is not appropriate.

Running of jobs that take over large portions of cluster resources may be an abuse of the system, which is designed as a community resource. Take care to design your computations so that you respect other researchers’ interest in using the system.

Users of the cluster are encouraged to implement “check-pointing” for jobs that run for long periods, since node failures and scheduled maintenance may require interruption of processes. The use of checkpointing is good computing practice for long running jobs.

Sensitive information is not allowed on the cluster

Security and compliance provisions that are in place on the Duke Compute Cluster are not sufficient for sensitive information, such as HIPAA-regulated “Electronic Protected Health Information” and FERPA-regulated student records. Additionally, some data may be bound by restrictions in “data use agreements,” and those agreements may require more strenuous security than is in place on the cluster. However, in many cases, information can be de-identified and then introduced to the cluster for analysis without violating data use agreements or government regulations.

Users of the cluster are responsible for the data that they introduce to the cluster for analysis.

For more information on the classification of data, see the “Data Classification Standard” (PDF; security.duke.edu) and “Duke Services and Data Classification” (PDF; security.duke.edu). Other policies and documents (security.duke.edu) are available from the Information Technology Security Office (ITSO).

If your research calls for use of sensitive data, contact the ITSO (security@duke.edu) or research computing staff (rescomputing@duke.edu). In many cases, data storage and computational resources with special security features are available to work with sensitive information.

“Points-of-Contact” (POCs) have responsibility for members of groups using the cluster

Every group on the Duke Compute Cluster has at least one “Point-of-Contact” who is charged with the following responsibilities:

  • managing and vetting a group’s membership,
  • serving as a central point-of-contact (hence the moniker) for communications from IT and research computing staff that are pertinent to a group,
  • arranging the disposition of data produced and left by former members of the group, either as the data steward for the research being conducted or at the request of the data steward (e.g., grant PI or faculty member) who bears final responsibility for the care of the data,
  • acting as an arbiter of trust who “vouches” for secure and responsible uses of the resources by members in his or her group, and
  • helping to assess the group’s responsible use of the shared cluster’s storage and compute resources provided by the University and other researchers in the cluster.

The POC for a group is a person of authority in a lab or research group. For groups that are created for a class, the course instructor serves as POC.

Periodic review of group membership and patterns of cluster use by groups

The Point-Of-Contact (POC) for a group should review membership at least on an annual basis and, more prudently, on a semester basis so that lingering group members can be removed. From time to time, Duke Research Computing or the IT Security Office staff may request that a group POC review the membership of individuals in a group.

Members of a group can be seen by using the tool on the cluster (/opt/apps/admin/group_members). This tool provides a list of the current members of your group, by NetID and name.

Duke Research Computing staff conduct periodic reviews of groups’ uses of the cluster storage and compute nodes in an effort to show groups their use of the cluster in a larger context and to help clarify the balance of use that is implicit in using a shared community resource. Some information about a group’s use of the cluster will be shared with other users of the cluster, and members of the faculty who serve on the Research Computing Advisory Group will also review cluster usage.

Data backups and appropriate use of storage resources

The Duke Compute Cluster is a data analysis tool, and data storage resources are an essential part of the installation. It is good to note that the cluster is primarily for data analysis and is not designed for data storage. Most of the data storage capacity is a shared resource accessible through the “/work” directory. “Home directories” for users are best used for scripts, software, small data sets, and results.
Only users home directories are “backed up.” The “/work” directory is not backed up and should be considered as “scratch” data space that can be vacated without notice.
Users of the cluster should retain a copy of their irreplaceable data at a separate location, and they should remove results from the system as soon as they can. Temporary and “ephemeral” data sets that are not essential should be deleted from cluster storage so that other users can use the capacity.
Data abandoned or left idle on the cluster’s /work directory will be automatically deleted in regularly scheduled “purges” of data.  Users who manipulate files to circumvent the temporary and “data-under-analysis” principle of the shared storage capacity will be invited to leave the cluster.
Contact rescomputing@duke.edu for assistance in retrieving data from backups, remembering of course that data in the /work directory is not backed up.

GPU use in the “common” partition

GPUs are available for use in the “gpu-common” partition. Because GPUs are under high demand and are often deployed for projects in a manner that removes them from the cluster environment for periods of time, users of the GPU resources in the “common” partition may have jobs interrupted and find that the number of GPU-accelerated machines varies from time to time. In order to limit disruption users should seek to limit the run times of their GPU-reliant jobs. For research that requires use of GPUs and that cannot tolerate interruption of jobs, dedicated GPUs can be procured and granted as “high-priority” resources.
GPUs are very much in demand, and we are adding to the GPU resources in order to make them available for the community. We have added “graphics” grade and “computational” grade GPUs and users should seek to use the GPU the best fits their requirements. More information is available by contacting rescomputing@duke.edu.

Privacy

The Duke Compute Cluster is a shared resource for Duke researchers and their collaborators. As a shared resource, the privacy that can be afforded to users is constrained. Users of the cluster must conduct themselves in a manner that respects other researchers’ privacy.
In order to assess and improve the functioning of the cluster, staff members who are involved with the systems administration and organization of the cluster will inspect submission scripts, software, and elements of the system from time to time.

Report security incidents and abuses of the cluster

Examples of a security incident include

  • misuse of data and information, such as Duke’s proprietary information and patient information
  • unauthorized access or use of Duke systems
  • a compromised account — including “shared” account credentials
  • a compromised system

If you observe such an incident or a violation of the behaviors and practices outlined in this document, please report it to your faculty advisor, lab manager, or group leader, the Duke Research Computing group (rescomputing@duke.edu) or the Duke University ITSO (security.duke.edu).