The Duke Compute Cluster (DCC)
The Duke Compute Cluster consists of machines that the University has provided for community use and that researchers have purchased to conduct their research. At present, the cluster consists of over 17000 CPU-cores, with underlying hardware from Cisco Systems UCS blades in Cisco chassis. GPU-accelerated computers are Silicon Mechanics with a range of Nvidia GPUs, including high end “computational” GPUs (V100, P100) and “graphics” GPUs (TitanXP, RTX2080TI). Interconnects are 10 Gbs.
The cluster itself is a project of the University community, with the hardware provided by individual researchers and the University. The University, through Duke Research Computing and the Office of Information Technology, maintains and administers the equipment for its useful life (designated to be four years) and provides support for cluster users. As a result of the incremental purchases, the cluster is heterogeneous, with a narrow range of Intel chipsets and RAM capacities, though purchases of equipment are organized and channeled by Duke Research Computing in order to ease maintenance and exploit economies of scale. New “standard” nodes come in two configurations: “Small” with 384 GB RAM and 40 CPU-cores and “large” with 768 GB RAM and 40 CPU-cores.
Since February 2016, the number of GPUs in the cluster has grown by more than an order of magnitude to well over 300 devices.
Researchers who have provided equipment have “high priority” access to their nodes and have “low priority” (or “common”) access to others’ nodes, including those purchased by the University, when idle cycles are available. Since researchers tend not to use 100 percent of the CPU of nodes they have purchased, “low priority” consumption of cycles greatly increases the efficiency of the cluster overall, while also providing all users the benefit of being able to access more than their own nodes’ cycles when they might need it. Jobs submitted with high priority run only on the nodes that members have bought, and low priority jobs on the machines yield to high priority jobs.
The Duke Compute Cluster is a general purpose high performance/high-throughput installation, and it is fitted with software used for a broad array of scientific projects. For the most part, applications on the cluster are Free and Open Source Software (FOSS), though some researchers have arranged for proprietary licenses for software they use on the cluster. The operating system and software installation and configuration is standard across all nodes (barring license restrictions), with Red Hat Enterprise Linux 7 the current operating system. SLURM is the scheduler for the entire system. Software is managed by “module” and, increasingly, through the use of Singularity containers, which incorporate an entire software environment and greatly increase reproducibility. The entire system is professional managed by systems administrators in the Office of Information Technology and the equipment is housed in enterprise-grade data centers on Duke’s West Campus. Software installations and user support, including training on using the system, is provided by experienced staff of Duke Research Computing.
Users of the cluster agree to an Acceptable Use Policy.