People who are granted access to the Duke Compute Cluster agree to the terms of this notice.
Users of the cluster agree to only run jobs that relate to the research mission of Duke University. Use of the cluster for the following activities is prohibited:
- Financial gain
- Commercial or business use
- Unauthorized use or storage of copyright-protected or proprietary resources
- Unauthorized mining of data on or off campus (including many web scraping techniques)
Data Security and Privacy
Users of the cluster are responsible for the data they introduce to the cluster and must follow all applicable Duke (including IRB), school, and departmental policies on data management and data use. Security and compliance provisions on the cluster are sufficient to meet the Duke data classification standard for public or restricted data. Use of sensitive data (e.g. legally protected data such as PHI or FERPA) or data bound by certain restrictions in data use agreements is not allowed. Data that has been appropriately de-identified or obfuscated potentially may be introduced to the cluster without violating data use agreements or government regulations.
As a shared resource, privacy on the cluster is constrained and users of the cluster must conduct themselves in a manner that respects other researchers’ privacy. Cluster support staff have access to all data on the cluster and may inspect elements of the system from time to time. Metadata on the cluster and utilization by group (and sometimes user) will be made available to all cluster users and Duke stakeholders.
Additional Responsibilities for DCC Principal Investigators and their Designees (Point of Contact or PoC)
- Managing and vetting a DCC group’s membership as lab membership changes
- Contact point between cluster support and the larger group
- Appropriately managing the data left by former members of the group
- Ensuring group members meet all cluster appropriate and responsible use policies
- Periodic review of groups use of shared cluster resources, and when appropriate, purchasing high priority access or additional storage
Best Practices for Use of Shared Resources
Cluster users are working in a shared environment and must adhere to usage best practices to ensure the performance for all cluster users.
Computational Work (Jobs) on Shared Resources
- All computational work should be submitted to the cluster through the job scheduler (SLURM). Running jobs on the login nodes is an abuse of the system
- Common partition resources should be used judiciously. Groups with sustained needs should purchase nodes for high-priority access. Use of scavenger partitions is encouraged for bursting, large workloads, and other short term needs
- Use of long running jobs on common and scavenger partitions is discouraged. This is for fairness to other users and because node failures and scheduled maintenance may require interruption of processes. The use of checkpointing is good computing practice for long running jobs on all partitions.
Cluster Shared Storage Resources (/work and /scratch)
DCC storage is designed and optimized for very large data under computation not data storage. Labs requiring long term data storage may upgrade their group storage or add additional storage at a cost, see our pricing.
In order to keep processing overhead low and operations fast on shared storage, there are no backups, and no logging of usage actions. Since these areas are susceptible to data loss, users of the cluster should retain a copy of their irreplaceable data at a separate location and they should remove results from shared space frequently.
Capacity is at a premium and users should clean up and remove their own data at the conclusion of their computation. Additionally, to prevent shared volumes from filling up, files older than 90 days on /work and will be purged on the 1 and 15 of every month. Notifications will not be sent. Touching files to expressly avoid the purge process is prohibited. If storage utilization reaches potentially impactful levels to users, the following procedure will be used:
- If utilization exceeds 80%, notice will be sent to top storage users advising that we are approaching capacity, save essential results to lab storage, and delete files that are least impactful to ongoing work
- If utilization exceeds 90%, files from the notified top storage users will be purged until utilization is back at 80%
- If the above efforts do not succeed in reducing utilization, a general purge will be run off cycle with decreasing age of files as needed, notifications will be sent to all /work users
Users who require exceptional use of /work (>20TB for more than 1 week) must notify email@example.com. Purge practices will change over time based on the needs of managing the cluster.
User Account Suspension and Termination
Inappropriate use of the cluster may result in either temporary or permanent suspension of your account (depending on the frequency/severity of the infraction) by a systems administrator. While each situation varies due to severity, general escalation of user suspensions is as follows:
- Warning to the user
- Suspension of the user with notice to the user and PI/PoC, PI/PoC may reactivate the user account with notice to Research Computing
- Suspension of the user with notice to the user and PI/PoC, PI/PoC may reactivate the user account after the user has attended office hours or other training deemed necessary by cluster staff
- Permanent suspension of the user account
If a PoC fails to enforce acceptable use of the cluster within their group, they may lose rights to act as PoC.
If you observe such an incident or a violation of the behaviors and practices outlined in this document, please report it to your faculty advisor, lab manager, or group leader, the Duke Research Computing group (firstname.lastname@example.org) or the Duke University ITSO (security.duke.edu).
If you have questions on this policy, if your use for the cluster is appropriate, or you need help implementing practices to comply with it, please contact Research Computing.
Last Updated: March 2020