Fourth in a series on data management

Mara Sedlins, PhD, is a CLIR fellow working on data management practices in the social sciences. Her CLIR fellowship is jointly sponsored by the Duke University Libraries and the Social Science Research Institute (SSRI).

Practice: “repeated exercise in or performance of an activity or skill so as to acquire or maintain proficiency in it”

If you’re a researcher thinking about data management, it’s probably because you’ve been required to. With the increasing number of data sharing mandates from funders and journals, many researchers are encountering “data management” as one more hoop to jump before they can achieve essential professional milestones. Before you submit your grant application to NSF, you need to write a data management plan (DMP); before you can get your article published, you need to put your data in a repository, maybe even get a DOI. These requirements can evoke a sigh, adding to the already-daunting list of things you need to do to get your research done.

But good data management, reconceptualized as a daily practice relevant to all phases of research, has the potential to make your research better, faster, easier – and even more fun!

Growing up in a musical family, I learned the value (and the frustrations) of practicing early in life. My parents started me on the violin at the age of 3 (and my brother on the cello at 2!). At that age, of course parent involvement is crucial. My mom, a music teacher herself, attended all of our lessons and supervised daily practice sessions. Some of these involved simply standing still with the instrument for a couple minutes, maybe bowing to imagined applause. Over time, we moved on to variations of Twinkle, Twinkle, Little Star — and by the end of high school, I performed a Mozart concerto with my school orchestra (I was very fortunate to live in a district with a strong music program). I continued playing the violin as an adult in various groups, and many of my closest friendships grew out of making music together.

Institutional data sharing requirements can seem a bit like having a parent standing over you, telling you to practice. You think, “why are they making me do this, when all I want to do is go outside, eat cookies, play with my data!” But over time, these mandates can lead to the development of abilities you didn’t know you needed – abilities that open the door to new possibilities. And like learning a new instrument, it’s okay to start with the basics. Small changes in the way you manage your data can reap large benefits over time.

There are many activities involved in data management, but a common thread is the goal of making your data accessible and intelligible to another researcher (often your future self). This means shifting your general orientation to data. Each step of the research lifecycle requires considering how other people will make sense of the decisions you make.

research life cycle

Current incentives for data management (i.e. DMP and data sharing requirements) are focused on the planning and publishing stages of the research lifecycle. But there are tricks to keeping your data organized and well-documented at each stage of research. This can be as simple as making sure you have backups of your data, using consistent file naming conventions, and taking better notes — or, depending on the needs of your project, you could set up a GitHub repository for collaboration and version tracking, create a written data management protocol for your lab, or allow others to test the reproducibility of your code on a site like RunMyCode (which, I realize, probably sounds terrifying to most people!).

To quote research data management expert Kristin Briney, “The reason you do all of these data management practices is so that you don’t get stuck without your data when you need it or end up spending hours trying to reconstruct your data and analysis. The rule of thumb is that every minute you spend managing your data can save you ten minutes of headache later. Dealing with your data can be a very frustrating part of doing research but good data management prevents such tribulations.”

Here is an illustration of some of those tribulations in action:

Changing the culture of research to be more open and more reproducible requires a shift in thinking about the daily process of doing research to be more generous, more outward-facing activity. Much of research has the appearance of being solitary: sitting at a computer, analytical software open, panes of analysis code, output, neat rows of numbers. But like so much activity we do at our computers these days, there is a social element to this as well. For open science to work, what you do with your data needs to become other-oriented, a letter to other researchers, or to anyone who might be able to make use of what you produce in the future.

As scientific endeavors become more collaborative, data management and analysis needs to include the art of reaching across to another person. This includes non-traditional collaborators, such as specialists in data curation, data management, data security, and research computing. And as more computing tools become available to automate data collection and rote analyses, the role of researcher as interpreter and communicator becomes more important. Data analysis becomes like any other creative work — analogous to interpreting a musical passage so that it comes across to the listener, or crafting a written piece to communicate a particular message or experience.

To do this well requires practice — as an individual, and as a scientific community as well.

As data sharing becomes a more standard practice, I hope that there will be more opportunities for feedback from those who are reusing data to those who created and documented it. The only way to know if something has been communicated well is to ask someone you’re trying to communicate with. By definition, if they didn’t get it, then the mechanism of communication needs to be improved. Right now, if you want your data to be successfully reused, you do the best you can to add documentation, put it in a repository, and hope that it makes sense to someone. But as time goes by and data reuse becomes more common, even across disciplines, I hope to see dynamic conversations around shared data – questions that lead to revisions and fine-tuning of metadata.

To cycle back to the analogy of musical practice, this kind of dynamic interplay is also central to rehearsing with a group of musicians — whether it’s an orchestra (which requires the active coordination of a director, akin to the PI’s role in a large grant), a string quartet, or a rock band. You start with a rough read-through of the piece, figure out what doesn’t work, make decisions about what you want to convey musically, and play it through again. Individual practice is important too. You need to spend time independently working on your own part, keeping in mind how it fits in with the others, and then bring that underlying preparation to the group. But playing music with others always ends up changing how you think of the part on your own – it’s a different experience. Ideally, you get to know your own part so well, that you’re spending most of your attention listening to and reacting to what the other musicians are doing. That’s when it starts getting really fun.

With research, documenting your data processes with the intention of making it understandable to others is akin to doing your individual practicing. But when science gets really fun is when you see how what you’re doing fits into the larger picture.

Kristin Briney. Data Management for Researchers: Organize, maintain and share your data for research success. Exeter, UK: Pelagic Publishing, 2015.

Image credits:
Cello Practice, Frederick Lang Jr., CC BY-NC 2.0
Research Data Life Cycle, University of Virginia Library