About the presenter
Dan Leehr, a “lifelong programmer” who has worked in the Duke community since 2006, currently serves as Scientific Data Applications Architect and Developer at Duke’s Center for Genomic and Computational Biology (GCB). He developed processes for image analysis and data management in Duke’s Department of Radiology, built database applications for researchers at the National Evolutionary Synthesis Center (NESCent), and now architects/develops scientific applications at GCB.
What the presentation is about
Researchers selecting software for computational analysis have more choices than ever. Much of it is open source or otherwise freely available. With so much diversity, putting an analysis pipeline together can rapidly become a mess of conflicting dependencies, programming languages, and glue code. As an alternative to gluing these pieces together with a scripting language, we’ve embraced Docker as a command-line tool and adopted standardized container interface. This approach, inspired by Bioboxes (https://github.com/bioboxes), allows us to put all the ugly stuff — complexity, dependencies, and the like — inside a container, while conforming to a simple external interface. We’ve built a python program (docker-pipeline; https://github.com/Duke-GCB/docker-pipeline) that implements this idea, and provides examples of the modularity, scalability, and reproducibility of this approach.