How to use open-source data and materials in your research


Reading time
7 mins
How to use open-source data and materials in your research

In the distant past, information was never subject to licensing or other restrictions. All across Eurasia, scribes would copy texts that they encountered, leading to great repositories of knowledge with often unknown providence. Still, only a lucky few were able to access this knowledge.

Despite later innovations, such as popularization of libraries, automated printing, and the Internet greatly improving access to knowledge, intellectual property laws mean that readers have often remained restricted in what they can do with information1.

Open-source data and materials contain no or far fewer restrictions. Such materials are freely available for anyone to access, use, modify, and share at no cost. They can include datasets, software, hardware, publications, educational resources, and more. Here, we discuss how to intelligently use such information to improve the open science ecosystem.

A brief history of open source

The open-source movement appeared in the 1980s in response to the restrictive patents and copyright that constrained software distribution2. In response, the open-source movement introduced a novel idea—applying licenses to data and software that ensure that users’ liberty is not restricted. This now comes in two models: copyleft licensing, such as the GPL license, wherein information is shared so that anyone can use and share information without restriction provided that they apply the same license; and permissive licensing, such as the BSD license, which places no restrictions on users3.

Since the monumental success of the free and open software movement, open licensing is now being applied to other types of data, such as scientific datasets, picture files, text publications, and even hardware.

Open source in scientific research

Open-source data is growing due to the many benefits it confers to researchers, particularly to early-career researchers4. First, it facilitates collaboration and in particular interdisciplinary research. Making data open also increases transparency, which can aid the reproducibility and reliability of research, as well as aiding reach. Finally, it can also lower the costs and barriers of conducting research. A researcher in a university in the Global South is far less likely to have access to cutting-edge equipment like a next-generation sequencer, but if they have access to open-source data, they can perform useful bioinformatic analyses despite having fewer resources.

However, using open-source information comes with challenges and pitfalls, including difficulties with finding relevant and high-quality sources, questions regarding the validity and reliability of these data and materials, and ethical principles and practices that must be maintained.

Finding relevant sources

Before using open-source data and materials, one must, of course, find sources that are relevant to your research topic and methods. To facilitate the sharing of such data, researchers have developed many platforms and repositories that host open-source information for many disciplines. Some examples are:

  • Zenodo: A general-purpose repository operated by CERN that allows researchers to upload and share any type of research output, such as datasets, software, publications, or posters.
  • GitHub: A platform that hosts open-source software projects and allows developers to collaborate on code, documentation, and issues. It also includes version control features.
  • Open Science Framework: A platform and management tool that supports the entire research lifecycle, from project management to data sharing and archiving.
  • Figshare: A repository that allows researchers to upload and share any type of research output, such as datasets, software, publications, or media files.
  • Dryad: A repository that curates and preserves data associated with peer-reviewed publications in the natural sciences.
  • Kaggle: A platform that hosts datasets and competitions for data science and machine learning.
  • OpenAIRE: A network that aggregates open access publications and data from various sources across Europe.

These are just a few examples of open-source repositories that you can explore for your research. Search engines such as Google Dataset Search or DataCite can also help you find open-source data from various sources.

Assessing the quality and reliability

Learning to assess the credibility and reliability of information is vital to any researcher. Likewise, after finding potential sources of open-source data and materials for your research, you should assess them thoroughly. Some criteria and key questions that you can use to evaluate open-source data and materials are:

  • Provenance: Who created or contributed to the data or material? What is their expertise and reputation? How can you contact them for more information or clarification?
  • Documentation: How well are the data or materials documented? Do they include clear and comprehensive information on its origin, purpose, methods, format, structure, and variables?
  • Metadata: How well are the data or material described by metadata? Do they follow any standards or schemas?
  • License: What are the terms and conditions of using the data or material? Do they have a clear and explicit license? Do they allow you to access, use, modify, and share the data or material without any restrictions or fees?
  • Quality: How accurate, consistent, and relevant, are the data or materials? Do they meet your research needs and expectations? Does the data set have any errors or limitations?
  • Format: How easy is it to access, use, manipulate, or analyze the data or material? Is it in a standard or compatible format? Does it require any special software or tools?

Once the answers to these questions become clear, you can select data that are most suitable for your research.

Adhering to ethical principles and practices

It’s vital to adhere to the ethical concepts that respect the rights and interests of the creators or contributors of the data or material, as well as those of your research community.

First, the intellectual property rights of the creators or contributors of the data or material should be respected, with appropriate acknowledgement of their contributions. You should also comply with the license terms and conditions of the data or material and avoid any license violations5.

Next, the data or material that you use in your research should be cited according to the citation guidelines provided by the source or publication. You should also provide a link to the data or material and indicate any modifications that you have made to it.

Finally, open-source data should be used in the same responsible and rigorous manner as data that you have generated yourself. This includes avoiding any errors, biases, or misinterpretations that may compromise the validity and reliability of your research. As with any data, you should also report any limitations or uncertainties that may affect your research results and conclusions6.

By following these ethical principles and practices, you can use open-source data and materials in a way that is respectful, responsible, and beneficial both for your research and for the research community.

Conclusion

Open-source data and materials can be a great boon to researchers if they are used intelligently. Being a responsible steward of open-source information can benefit not only individual researchers, but their whole academic community. Releasing your information in an open-source repository is also a crucial way that you can contribute to your field and maximize the impact of your research, which can even boost your career7.

 

Would you like a 1:1 consultation with an expert statistician? Check out Editage’s Statistical Analysis & Review Service.

References

1.            Keeping science open: the effects of intellectual property policy on the conduct of science | Royal Society. https://royalsociety.org/topics-policy/publications/2003/keeping-science-open/.

2.            FSF History. https://www.fsf.org/history/.

3.            Salter, J. Open source licenses: What, which, and why. Ars Technica https://arstechnica.com/gadgets/2020/02/how-to-choose-an-open-source-license/ (2020).

4.            Allen, C. & Mehler, D. M. A. Open science challenges, benefits and tips in early career and beyond. PLOS Biol. 17, e3000246 (2019).

5.            Understanding and Complying with Open Source Software Licenses - Why, When and How. Lexology https://www.lexology.com/library/detail.aspx?g=b84017e3-013b-4808-bd39-fa5c3211cf03 (2021).

6.            Labaree, R. V. Research Guides: Organizing Your Social Sciences Research Paper: Limitations of the Study. https://libguides.usc.edu/writingguide/limitations.

7.            Popkin, G. Data sharing and how it can benefit your scientific career. Nature 569, 445–447 (2019).

 

Be the first to clap

for this article

Published on: Jun 21, 2023

Helping researchers and English language learners bridge gaps with audiences and embrace new opportunities
See more from David Burbridge

Comments

You're looking to give wings to your academic career and publication journey. We like that!

Why don't we give you complete access! Create a free account and get unlimited access to all resources & a vibrant researcher community.

One click sign-in with your social accounts

1536 visitors saw this today and 1210 signed up.