-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an SPDX file to the repository to streamline license and security reviews by user organizations #42102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggesting a change to the license for patchelf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
That's all the stdlibs added. |
…that the LLVM license changed from UIUC to Apache back in v8. Corrected the license reference in THIRDPARTY.md which pointed to the v6 license and is no longer correct.
Added the rest of the Base and Core external dependencies. |
@goneall can you have a look at my package entry for libgit2? This package has a special exception to the GPL and I had to create my own license info field for it. Does that look right to you? And do you know if there is an exception in the license list I should be using instead? I didn't see anything that looked right, but I it's good to double-check. Thanks. |
@SamuraiAku The current SPDX spec license expression documented in Annex D only allows standard exception ID's. LicenseRef style ID's are not allowed. The way to handle this situation is to create a LicenseRef for then entire license including the exception. In this case, it would be something like LicenseRef-GPL2-only-with-libgit2-exception. The license text for this license ref would be the either the text as found in the library or GPL-2.0 license texted concatenated with the exception text. Note that there are some discussions in progress to allow the use of LicenseRefs for exception in the SPDX 3.0 spec. |
Changed my mind on SuiteSparse. Julia uses too many modules to make it worthwhile to make each one a package in SPDX. The copyright text field is very large, but the licensing fields make sense I think. |
I think that's everything. The SPDX file describes every external dependency listed in THIRDPARTY.md, plus all the external stdlibs. Next step is to review all the entries and add a script under contrib for updating the document for each release. @ViralBShah I have a question about Zlib and 7-Zip. They are listed separately in THIRDPARTY.md where it says "Julia bundles the following external programs and libraries". I presume that means that they are used by Julia in a different manner than the other dependencies. I wasn't sure what that was, so I guessed that these two are RUNTIME_DEPENDENCY_OF Julia while all the other non-build tool dependencies are BUILD_DEPENDENCY_OF Julia. Can you comment? |
I believe zlib is a build dependency and 7-zip is a runtime dependency. |
For review, I've read back the SPDX file in this PR and created a table of the most important information fields with the following code
|
Looks great. Let's have a few more eyeballs on this. |
Added a script in contrib that allows for easy updating of the SPDX document with each release. I think this is ready to merge. |
Addresses #35042 partially. |
I suggest moving the |
I would keep it at the top level. It's equivalent to THIRDPARTY.md, in a standardized format. Automated scanners may or may not find it if we move it down. On a related note, is there anything we should do to make sure that the contrib/updateSPDX.jl script is run with every release? It makes the most sense to me to run the script in the same commit that updates VERSION. |
Add it to the release process checklist (release-candidate) in the Makefile. Though it looks like @KristofferC or @staticfloat should update parts of those directions, since we don't upload the files manually anymore. |
… reviews by user organizations (JuliaLang#42102) * Add an SPDX file to the repository. * New script contrib/updateSPDX.jl . Ran the script to update the SPDX file.
… reviews by user organizations (JuliaLang#42102) * Add an SPDX file to the repository. * New script contrib/updateSPDX.jl . Ran the script to update the SPDX file.
THIRDPARTY.md lists a few source files which are part of Julia but use licenses other than MIT. Shouldn't they be listed in the SPDX file? Otherwise it's misleading to say that Julia itself is MIT. I'd even argue that information about these files should be moved from THIRDPARTY.md to LICENSE.md, as they are not third-party libraries at all. |
Adding the files and code snippets to the SPDX file should be done, but doing it properly seemed too hard for a first attempt. With the external dependencies it’s fairly easy to browse the deps directory and see what changed and update accordingly. Additional licenses in the source code have to be tracked manually which makes any updates or deletions more painful. But it can be done. On overall licensing of Julia, unless it’s discovered that some GPL has been included in the Julia source, which I don’t think has been done, then all the Julia source would I think be considered MIT since it’s more permissive that the other licenses present and full credit is given. But the Julia binary is GPL since some of the external dependencies are GPL. Not an issue since all code is published. Only if someone wanted to roll proprietary improvements into Julia source and redistribute would that be an issue. |
Unfortunately, the problem is precisely that MIT is more permissive than licenses used by at least some of the source files listed in THIRDPARTY.md. For example, the Zlib license adds the restriction that "The origin of this software must not be misrepresented" (it's not completely clear what this implies...). The LLVM licence clearly requires that "Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimers.". But the MIT license doesn't require that, so if Julia includes that file and yet claims that the whole codebase is MIT, it doesn't give the correct information for people who want to redistribute it. The fact that we have an SPDX file that doesn't mention these licences is dangerous for them. EDIT: BTW, I think we are also not respecting these licences when building Julia tarballs, as they don't include a copy of the text of these licenses (only an hyperlink in THIRDPARTY.md), contrary to what they require. We should probably copy them to a directory. |
I think we need to open this as a separate issue. Agree on adding the text of the licenses if that is required by those licenses, and adding to the SPDX file. I believe the MIT license also requires it, since it has the clause: "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software." Of course we do include the MIT license. |
What needs to be done here? This issue is closed but the last few comments imply that some issue remains. |
Milan pointed out some things above that are part of Julia source and not third party. |
Summary
This PR proposes to add an SPDX file to the repository. This file would describe all of the source code required to build Julia that exists outside of the main repository, in a human and machine readable format.
Background
The Software Package Data Exchange (SPDX) is a standard maintained by the Linux Foundation. It's reason for existence is best summarized in section 1.3 of the specification:
The SPDX homepage is https://spdx.dev
A helpful overview of the use of SPDX can be found at https://spdx.dev/resources/use/
The SPDX specification can be found at https://spdx.dev/specifications/
An SPDX file can be validated with an online tool provided by the SPDX organization (https://tools.spdx.org/app/validate/)
The SPDX specification was recently ratified as an ISO standard (ISO/IEC 5962:2021)
Justification for this PR
Today Julia is the underdog and anything that reduces barriers to adoption is a benefit to the project. The review of open source software by organizations, such as described above, is one such barrier. This type of review is commonly required for every point release that the organization wishes to use.
Julia is a large and quite complex project pulling code from 23 external repositories as part of the build (see THIRDPARTY.md for a full list). Julia also maintains 8 of the standard libraries in separate repositories.
When you describe Julia like that it's quite clear that performing a proper review of Julia takes considerable effort. They have to go into each of the external repositories, check the license, see if anything important has changed since the last review, write this all down and then figure out if Julia is acceptable to their organization.
Given the amount of work involved you can understand why some organizations might give up on approving Julia on a regular basis and say that other more widely used tools such as Matlab, Python, and R are "good enough" for their needs.
The file THIRDPARTY.md does contain much of the information that the reviewer will want, but it is in an ad-hoc format that is awkward for a human reviewer unfamiliar with Julia and is not suitable for automated processing.
Adding an SPDX document to Julia has the potential to streamline the review process leading to faster, more frequent approval of Julia versions in organizations. Basically this document does a lot of the grunt work for the reviewer. It provides the reviewer with a standardized, human-readable, machine-scannable list of all of the external software components (packages in SPDX parlance) used in Julia, what their licenses are, where they are downloaded from and how they are used in Julia (relationships in SPDX parlance). Additional information and notes for each package can be added as needed. It also makes it easy for a reviewer to compare the current SPDX file with the file from the previously approved version of Julia and see if anything important has changed.
I certainly don't see how it could hurt the project to have the file present.
Tracking Version Information
The SPDX package for Julia itself includes the version number.
I have deliberately not included version info for all the software packages Julia pulls in from other repositories. I don't see how that information could be reliably kept up-to-date by hand and thus it is best not to include it at all. Instead I have added informational text in the field sourceInfo pointing the reader to the appropriate makefile where the version information is contained. This way even when package versions are updated the SPDX file does not need to be changed.
There are ways to incorporate the SPDX file into the build process so that we can guarantee the accuracy of the document, including version number, but that's best left for another PR
Updating the SPDX Document for each release
An SPDX file supports several file formats. I have chosen to use JSON because tools are widely available for this format, including for Julia, making it easy to update the fields.
Each time a new version of Julia is released just a few fields will need to be updated in this file, assuming no changes to the external software packages that need to be captured.
Status at initial posting
To kick off this discussion, I have included only a few packages to demonstrate how an SPDX file works. The packages were chosen to show all the different types of packages and what is similar and different about their description in SPDX. These packages are
This file has been validated by the SPDX group's online tool as conforming to the specification.
Next steps
Putting together a complete SPDX file is a lot of work. I would prefer to not continue that work until a consensus has been reached that the document presented here is acceptable and that approval of this PR would be likely when the work and reviews are complete.