Draft: Research software licensing guide

A practical guide on why software licenses matter and how to choose one and what practical steps to take

June 21, 2023 - Radovan Bast and Kathleen A. Smart


This is a preliminary draft of a proposed research software licensing guide to be used by researchers at UiT. We are sharing this draft with colleagues and various stakeholders for review and comments. Please send comments, suggestions, and corrections to radovan.bast@uit.no.

This guide is shared under the Creative Commons CC0 waiver.

What is "research software"?

We consider any code, script, notebook, or file, regardless of size, as "research software" if it is needed to generate, visualize, or reproduce data/results that are part of, or will form part of a publication. You don't need to be a "proper software engineer" to produce software. Most research software is not written by "proper software engineers" (whatever that means).

Examples of "research software" for the purpose of this guide:

  • Script to convert data from one format to another
  • Script to read data and visualize it
  • Program that generates data
  • Analysis script
  • Set of scripts that form an analysis pipeline
  • Code that is compiled
  • Code that is dynamically interpreted and not compiled
  • Web app

Why software licenses matter

Imagine the following frequent scenario: You find some great code or data that you want to reuse for your own publication. This is good for the original author - you will cite them. Maybe other people who cite you will cite them. You need to modify the code a little bit, or you remix the data a bit. But then, when it comes time to publish, you realize there is no license to the original work. Thus, you can't release the new stuff you made!

Now you have a problem:

  • You manage to publish the paper without the software/data but others cannot build on your software and data and you don't get as many citations as you could.
  • Or, you cannot publish it at all if the journal requires that papers should come with data and software so that they are reproducible.

Next time you are smarter and check the license before building on someone else's work. And others may approach your work the same way.

Open science is built upon sharing of research data and software in a FAIR (findable, accessible, interoperable, and reusable) manner. For research data, reuse entails applying an open data license (generally using Creative Commons licenses), but sharing of software requires licenses with more specific terms.

In open science we strive towards reproducibility (results can be verified by other researchers) and reusability (results can be reused in follow-up research by you and/or others without starting over from scratch).

Without a license that allows reuse and redistribution, and additionally clarifies terms under which code can be reused and redistributed:

  • others may not be able to publish derivative work based on your code
  • you may not be able to publish derivative work based on somebody else's code or even your own code (e.g. after changing jobs)

This means that clarifying terms of use is essential for derivative work based on your research to be even publishable. Choosing an open source license can also be good insurance for you against being locked out of your own code after changing affiliation/group or job.

When should I add a license?

Choose a license early in the project, even before you publish it. Later in the project it may become complicated to change it. Agreeing on a software license does not mean that you have to make it open immediately. You can also follow the "open core" approach: You don't have to open source all your work. Core can be open and on a public branch. Unpublished code can be on a private repository.

However, we recommend to work as if the code is public even though it still may be private: This is to avoid surprises about code in the history with incompatible license years later when you decide to open the project.

How do I add a license to my work?

If your work is derivative work

(here we will add a decision tree/ flow-chart)

Your code is derivative work if you have started (partially) from an existing code and made changes to it or if you incorporated an existing code into your code.

If your code is derivative work, then you need to check the license of the original code. Depending on the license, your choices might be limited. In this case we recommend to use these two resources:

If the original code does not have a license, you may not be able to distribute your derivative code. Even if it doesn't have a license, it still might have terms of use, which might or might not be compatible with a standard license. You can try to contact the authors and ask them to clarify the license of their code.

Practical steps for incorporating something small into your own project with a license that allows you to do so (as an example incorporating a function or two from another project):

  • Create a LICENSES/ folder in your project and "put the unmodified license text (i.e., the license text template without any copyright notices) in your LICENSES/ folder" (https://reuse.software/faq/#license-templates). This way if you reuse code from multiple projects, you can keep there multiple license files.
  • Put the code that you incorporate into a separate file or separate files. This makes it later easier to see what was incorporated, and what was written from scratch. On top of the file(s) which you have incorporated into your project add (and adapt) the following header (more examples):
    # SPDX-FileCopyrightText: 2023 Jane Doe <jane@example.com>
    #
    # SPDX-License-Identifier: MIT
    
    The REUSE initiative was started by the Free Software Foundation Europe to make licensing of software projects easier. It is OK if you prefer to not follow this strict format but the advantage of following it is that the reuse-tool makes it then easy to verify and update license headers if you have many files from different sources.
  • If it does not make sense to have several files in your project (e.g. when incorporating something into a notebook), then add a note/comment about the license and where the code came from on top of the function.
  • Although it is not dictated by the license but it can still be nice to acknowledge the incorporated functions/code in your README/documentation and to cite their work if you publish a paper about your code.
  • Some licenses are more permissive (you can keep your changes private) but some licenses require you to publish the changes ("share-alike" or "copy-left").

Practical steps for making changes to an existing project with a license that allows you to do so:

  • If the project is on GitHub or GitLab or similar, first fork the project (copy it into your user space where you can make changes).
  • For the BSD and MIT licenses you are not obliged to state your changes but it can still be helpful for others if you do. You can state your changes in the header of the files you have modified. It can be helpful to state bigger-picture changes in the README file of the project.
  • Some licenses are more permissive (you can keep your changes private) but some licenses require you to publish the changes (share-alike).

If your work is not derivative work

If you have started "from scratch", and not used any existing code, or incorporated existing code into your code, then you may consider your code to be not derivative work. Note that if you only use (but not incorporate or change) libraries, plug-ins, and packages, which are distributed via platforms like PyPI, Conda, CRAN, Crates, ... this is typically not considered derivative work.

Before you may choose a license, clarify the following points with, for example, your supervisor, principal investigator, collaborators, or research support service:

  • Does your work contract, grant, or collaboration agreement dictate a specific license?
  • Is there an intent to commercialize the code?
  • When there is unknown or mixed ownership: If there are multiple persons or organizations as owners of the code, all must agree to the license.

Do not invent your own license. Choose one of the standard licenses, otherwise compatibility is not clear:

If there are no restrictions from the above points, and no overriding guidelines exist, we recommend one of the following two licenses:

  • European Union Public Licence, Version 1.2 or later (EUPL) (https://data.europa.eu/doi/10.2799/77160): this license is an official license of the European Union (EC Decision, part of European law). It is interoperable, reciprocal (derivative work must publish and provide back the modified source code), and compatible with many standard licenses. It is a good choice if you want to make sure that you have access to changes and improvements applied to your code so that you can integrate them back and reuse them in your own work or you want some competitive advantage (right to relicense) over other users of your work.
  • MIT License: Choose this license to prioritize ease of reuse over requiring derivative work to provide modifications back to you and the community: Others will be able to do whatever they want with your code as long as they include the original copyright and licence notice in any copy of the software/source. For small and short-lived projects this can be a good choice.

Note that with the exception of CC0 (i.e. public domain dedication) CC licenses are not appropriate for software (although they can be used for software documentation).

Practical steps:

  • Create a LICENSES/ folder (example).
  • Put the unmodified license text (i.e., the license text template without any copyright notices) in plain text format into the folder (example). Here are the two above licenses in plain text: EUPL and MIT (we recommend to use the MIT license text without the copyright header).
  • Add copyright and license information to each file following https://reuse.software/tutorial/ which uses a standard format with so-called SPDX identifiers. Example below (example):
    # SPDX-FileCopyrightText: 2023 Jane Doe <jane@example.com>
    #
    # SPDX-License-Identifier: EUPL-1.2
    
    The REUSE initiative was started by the Free Software Foundation Europe to make licensing of software projects easier. It is OK if you prefer to not follow this strict format but the advantage of following it is that the reuse-tool makes it then easy to verify and update license headers if you have many files from different sources.
  • Create a CITATION.cff file (example; more about it below).
  • For really small projects with one or two files the above may seem excessive and some projects choose to not have copyright information on top of their files and they only have one LICENSE file and a CITATION.cff file and that is OK for really small projects.

Make it persistent and citable

Choosing a license, adding a license file, and putting your code on GitHub or GitLab is good start, but to ensure FAIR (findable, accessible, interoperable, and reusable) research software for long term access, we recommend to go two steps further:

The reason why we currently recommend Zenodo over DataverseNO for software products is that it is easier to update versions. DataverseNO currently does not allow to assign different licenses on a file-level which may result in license conflicts (possible case: publishing software under CC0 on DataverseNO and having it under a more restrictive license on GitHub at the same time). We believe that linking from the DataverseNO entry to the software as "related dataset" is a better option.

Software Heritage and CodeMeta exist as an alternative ecosystem that is currently receiving some attention on a European level. Comparison and links to converters can be found in https://zenodo.org/record/8086413.

Additionally, UiT and other major national (RCN Open Science) and European funders (EU Open Science) require open access to publications and research data (including research software) as early as possible, so self-archiving your research software satisfies these requirements. For more information on these policies, please see: UiT Research Data Portal, UiT Regulations on Research Data, or contact the University Library at researchdata@uit.no

How about data?

Citing from the Principles and guidelines for management of research data at UiT:

  • "The researcher shall make research data openly available for further use to all relevant users, providing there are no legal, ethical, security or commercial reasons for not doing so."
  • "Research data shall be equipped with licenses for access, reuse, and redistribution."
  • "The licenses should be internationally recognised and place as few restrictions as possible on the access, reuse and redistribution of the data."

By default, the UiT collection in DataverseNO recommends the Creative Commons CC0 waiver. The UiT Research Data Portal contains information and guides on why and how to add such a license to your data.

The Horizon Europe guideline specifies that data should be CC0 or CC-BY. Metadata must be CC0. The recommendations of the Norwegian Research council can be found in the report "Hvordan skal vi dele forskningdata" (p. 40). The overall principle rule is "as open as possible, as closed as necessary".

We wish to emphasize that using a restrictive license with the intention to protect privacy of the data is not a good substitute for having the data sufficiently protected in other ways.

Questions and contact

Unsure which license to choose? Unsure how to proceed with sharing your code or reusing somebody else's code? The research software engineering (RSE) group at UiT is here to help you. You can contact us at rse@uit.no or come to our office hours. For questions regarding sharing of publications and research data, contact us at researchdata@uit.no.

Great resources

Acknowledgements

We are very grateful to Korbinian Michael Bösl, Philipp Conzett, Richard Darst, Luca Ferranti, Noortje Haugstvedt, and Jenny Ostrop, for their comments and suggestions which significantly improved this document.