In November 2014, the Duke Data Commons went online, accepting data from users of Duke’s molecular data core facilities and the Light Microscopy Core Facility an from NIH-supported researchers. The 1.5 petabyte capacity storage was made possible by a grant from the National Institutes of Health (NIH 1S10OD018164-01) awarded to Greg Wray, Professor of Biology and director of GCB. The storage is mounted on the Duke Compute Cluster, and the equipment has allowed “Windows Network Shares” (CIFS) connections to labs as well.

Now that the equipment has served for approaching four years, plans for the retirement of the original equipment are underway, with decommissioning of the equipment planned in September 2018.

Duke’s Office of Information Technology (OIT) is now planning a hardware refresh of the aging storage, with the goal of providing as capable storage capacity at a cost in the ballpark of the original Data Commons storage. Plans for that replacement should be clear in Spring. The orderly transfer of data will begin thereafter.

Going forward

Researchers who have used the Duke Data Commons storage should take the opportunity to assess their data, removing data cruft, documenting and preserving data of abiding value, and making arrangements for deposit of data associated with publication. The Duke University Libraries offers data repository services through the Duke Digital Repository.

An anticipated benefit of the change: more researchers can take advantage of the Duke Data Commons, since NIH restrictions on who can use the resource will no longer apply. This opens up research data storage to all disciplines and fields.

The plug will be pulled on the old equipment in mid-September 2018.

What has been accomplished since 2014

Use of the capacity has been good, and some might say remarkable, with allocations of capacity amounting to over 830 terabytes in the first year of operation and growing to over 1.3 petabytes in the second and third years of the equipment’s availability.

The usage particularly helped the School of Medicine faculty and core facilities, as was anticipated, with the predominance of the storage used for research purposes. Since the purchase of the equipment itself was supported by the NIH S10 funding of nearly $600,000, researchers had only to pay for the service contracts for the equipment. Costs in the first year amounted to $92/terabyte per year, dropping to $82/terabyte per year (or $0.082/gigabyte per year) in the second year of operation. These costs are well below storage with similar performance, of course.

Duke Data Commons pie chart showing usage by groups, 2017 (inset 2016)

 

A tally of publications listed in annuals reports to NIH shows a growing impact of the equipment. From two publications in the first annual report in 2015, the list grew to 25 publications in 2016, plus “at least 26 scholarly presentations and three dissertations/theses.” The 2017 annual report listed 52 publications that appeared since the previous report.

 


Hard drive image by William Warby from London, England (Hard Drive) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons