Recently, China announced plans to mandate researchers in the country to deposit their scientific data into national repositories and allow open access to their data sets. This is part of China’s move to enable data security, even as regional bodies like the European Union intend to implement laws around data protection in the coming month. How do choices around data security affect our research lives? Coincidentally, when I was writing this article, a friend who is pursuing her PhD called to tell me about her colleague’s harrowing tale --- he had saved all his research data of five years on a hard disk that he misplaced and is now unable to trace!
Here's an example of how things can go wrong when it comes to research data:
Clearly, the hallowed walls of academia are also home to horror stories of data loss and mismanagement, leading to victims almost giving up on their PhDs or rerouting their research to accommodate whatever they could manage with the retrieved data. As a graduate student in the physical or medical sciences, your laboratory would possibly be the place that houses your data. If you pursue a PhD in the humanities or the social sciences, having hard disks full of audio, video, and transcription data comes as no surprise.
While every researcher wants to believe that their data is safe, this could be far from truth. Often, we are so engrossed in sailing through the many stages of a PhD, bagging that next grant, or scoring tenure that we tend to sometimes overlook the very basis of a research endeavor --- the data. Working on a research project for years requires one to continually upgrade one’s resources that consequently leave one with the data that is a rich pool, with multiple strands of possible output. In such a scenario, managing research data is integral to the successful completion of the research project at hand. This article takes you through some of the key aspects of data management for researchers.
Here’s why data protection matters
In the world of research, data assumes significance by virtue of being the medium that showcases both the quality and integrity of a research study. Therefore, effective data management should be of interest to every researcher. Universities are increasingly setting up their own data management systems in order to allow their research scholars to store, access, and work with data.
Responsible management of data can be the key to countering one of the biggest problem areas for scientific research: irreproducibility. Managing registered data effectively and storing it in favorable formats enables multiplicity in terms of data usage; researchers can use varied facets of the collected data in diverse ways. Moreover, it allows differentiation between multiple iterations of the same set of data by different individuals. Accessibility of this data is a key aspect of ensuring that science is reproducible.
Next, effective data management safeguards against loss of data, which of course could have far-reaching consequences on the research output and publication. It would now take years, for my friend’s colleague to replicate the research he was well on the way to completing!
Accessibility of research: The key to the future
Access forms a critical aspect of data management since it signifies the ability to build on existing research and probe unexplored possibilities. A recent survey by Springer Nature suggests that, “More than 70 per cent of respondents agreed that all future research articles, scholarly books and research data should be accessible via open access, with 91 per cent of responding librarians in agreement that ‘open access is the future of academic and scientific publishing’”. To this end, there is now a growing emphasis by all stakeholders of science to not only store data responsibly, but to also make it easily accessible.
Technology companies and angel investors now have a bouquet of applications and services directed at researchers. Software applications like Mendeley, Readcube, and Endnote enable the storage and referencing of text-centric data, which forms the bulk of research in the social sciences and humanities, also complementing reading and annotations in the natural and physical sciences.
A lot of times, institutional and/or lab-driven access to data needs to be shared with those working on the same project as well as with third parties. There is now a push for open data, egging researchers to make a choice between making their data available through open access or shared access platforms.
Identifiers and repositories are being used today in a big way to manage data effectively. Some of the benefits of using these include:
- Safeguards research: They help catalogue access points and extent of access, thereby permitting access but safeguarding it from unwanted interlopers.
- Allows branding of data: They allow the researcher the access and reach to categorically brand his/her research as cutting-edge and new.
- Makes providing access easy: As explained above, repositories and identifiers enable access by providing datasets in one place, allowing for ease of use.
Data can also be allocated and leveraged for newer inquiries. Branding your research is important since there is a deluge of research, albeit not of high quality. By using identifiers and repositories, not only are you creating an identity for your research, but you also ensure that your research is safely deposited in one place. They not only do they allow one to navigate batches of research data, but also help underscore their security.
Best practices to manage your priceless data
Concrete data management plans now find place in funding documents, sponsorship records, and in key official paperwork as they indicate credibility and promise replicability. Drawing from this understanding, here are a few best practices to secure and manage research data:
- As far as possible, make sure that you store your data on cloud. You can do this either by using private accounts with cloud service providers, or by using your university data management system, if available. This minimizes chances of data loss.
- Format data in easy-to-find and easy-to-use ways. This would increase searchability, especially when the data is open or shared. Standardized metadata has emerged as a key part of many funding contracts, drawing on international best practices. Make sure to draw on the benefits of software applications to store, segregate, and format your data.
- You can choose to provide open access to the managed data, as is increasingly being advocated, with special focus being given to data retention and curation.
- Make it a practice to store data that accumulates every day in an organized way. Categorize, format, and store them in recognizable and easy-to-access spaces.
Data is often described as a researcher’s gold mine, and it is not far from truth. Research data is the result of huge investments in terms of money, time, and resources. And therefore, safekeeping of this data and improving its accessibility is the key to scientific progress. However, managing data is often a persistent challenge for researchers throughout all stages of their career. Bringing more awareness to the importance of data management is the way forward for the scientific community.
What are your thoughts on the ways in which data is managed and accessed? Share your views, opinions, and personal experiences.
Related reading: