Research Toolkits, Duke’s “on-demand” source for research computing virtual machines (VMs) and research data storage, is being updated over the next few months, with an initial code update completed in late May. The service was launched in 2015, and now, after four years of successfully supporting Duke researchers, the system is being upgraded. In addition, the system is set up with power-saving features.
All regular-rank faculty members from across the University and School of Medicine have a default “allocation” of CPU-cores, memory (RAM), and local data storage. The allocations can be applied in part or as a whole to create VMs, using system “templates.” Researchers can choose from templates for several Linux distributions — including Ubuntu and Red Hat Enterprise Linux — and for three versions of Windows. Some templates include commonly used research software applications, such a Jupyter or R-Studio.
Faculty from every corner of the institution have made use of the service, which is accessible via the web and NetID authentication. It also encourages and supports faculty collaborations, since allocations can be combined quite easily so that researchers can create more powerful VMs. Additional allocations in single CPU-core units, each with 10 GB of RAM, area available for $112/year.
Among the major users of Research Toolkits is the Duke Proteomics and Metabolomics Shared Resource. Dr. Arthur Moseley, the core facility director and professor of medicine, uses RAPID VMs “extensively for proteomics, primarily plasma proteomics.” The core’s staff have taken advantage of a feature that allows faculty to combine their allocations, making it possible for Moseley and his staff to create a large VM with 16 CPUs, 160 GB RAM, and 500 GB of scratch space. The VM has made it possible for the core to complete a major study with an 861-patient plasma cohort. “The data acquisition took about 1290 hours, which generated about three terabytes of raw mass spectrometry data,” Moseley said. “The virtual machine was used for the initial data transformations in near ‘real time’ format.” The large VM has been used to run a number of proteomic and metabolomic software applications.
Since the beginning of the service, more than 1,100 virtual machines have been created for a variety of projects from various disciplines. Faculty members or people they give “admin” privileges set up “projects,” include people as project members, and supply the project computer and data storage resources. The process is completely web-based (https://rtoolkits.web.duke.edu/). Once an “admin” requests and configures a virtual machine — a simple and quick process — the VM is available within minutes.
Research Toolkits has helped Duke labs. The lab of Dr. David MacAlpine, professor of pharmacology and cancer biology, has used virtual machines from both Research Toolkits and “VCM” (Virtual Computing Manager; https://vcm.duke.edu/), which provides small-sized, short term VMs. A Research Toolkits virtual machine supports the lab’s genomic research into the mechanisms that ensure genetic and epigenetic inheritance. Its stable environment and generous CPU and RAM allocations provide a common shared environment for the group’s genomic analysis using Python and R. The short term VCM VMs are used for rapid development work, testing new libraries and software, and verifying that our analyses and figures can be readily reproduced on a ‘virgin’ VM. Together, MacAlpine observes, “the use of these different VM platforms have provided a robust and, importantly, a reproducible research enviroment for our work.”
Programmers working on the Research Toolkits update also created VCM. Research Toolkits and VCM have begun to converge, making improvements to either system relevant and portable to the other. VCM was originally intended as a resource for teaching and learning, since anyone with a Duke NetID can request a VM from the system. Liz Wendland, an OIT programmer who codes VCM and Research Toolkits, explains the rationale for the convergence of the systems: “I found myself enhancing VCM and then wishing for the same feature in Research Toolkits. By converging the code bases, we are now able to deploy features to both applications without a total rewrite. That means we can effectively double the number of enhancements for each application.”
Making VMs “greener” by reducing wasted power consumption
Updates to both VCM and Research Toolkits take into account useability and systems durability, of course, but they also automate “stewardship” of the university’s investment in machines and of the environment. Mark McCahill and Liz Wendland, designers and developers of VCM and Research Toolkits, built in automated shutdowns of VCM virtual machines in Spring 2019, after studying VM usage. “Between 25% and 28% of the VMs managed by Virtual Computing Manager are running at any given time,” McCahill said. “Most of the use is intermittent, and people that do need to run a VM 24×7 have that option.” Energy conservation in the digital world is no small matter; a 2013 study of energy use of information technology reported that the global energy use for telecommunications and computing “ranges from 1,100 to 1,800 Tera Watt hours (TWh) annually.” To put that in perspective, that much power is “in the same league as global lighting energy demand circa 1985.” The report title reminds us that “The Cloud Begins With Coal.”
What does this mean for the update of Research Toolkits? New VMs will by default be automatically powered down in early morning unless the owner of the VM specifies otherwise. This update will preserve use of Research Toolkits VMs as always on “servers” while also reducing power consumption of the overall Research Toolkts system. OIT anticipates that power reductions may be about that same as reductions in VCM — 70-75%.