Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an SPDX file to the repository to streamline license and security reviews by user organizations #42102

Merged
merged 44 commits into from
Dec 21, 2021

Conversation

SamuraiAku
Copy link
Contributor

@SamuraiAku SamuraiAku commented Sep 3, 2021

Summary

This PR proposes to add an SPDX file to the repository. This file would describe all of the source code required to build Julia that exists outside of the main repository, in a human and machine readable format.

Background

The Software Package Data Exchange (SPDX) is a standard maintained by the Linux Foundation. It's reason for existence is best summarized in section 1.3 of the specification:

Companies and organizations (collectively “Organizations”) are widely using and reusing open source and other software packages. Accurate identification of software is key for many supply chain processes. Vulnerability remediation starts with knowing the details of which version of software is in use on a system. Compliance with the associated licenses requires a set of analysis activities and due diligence that each Organization performs independently, which may include a manual and/or automated scan of software and identification of associated licenses followed by manual verification. Software development teams across the globe use the same open source packages, but little infrastructure exists to facilitate collaboration on the analysis or share the results of these analysis activities. As a result, many groups are performing the same work leading to duplicated efforts and redundant information. The SPDX working group seeks to create a data exchange format so that information about software packages and related content may be collected and shared in a common format with the goal of saving time and improving data accuracy.

The SPDX homepage is https://spdx.dev
A helpful overview of the use of SPDX can be found at https://spdx.dev/resources/use/
The SPDX specification can be found at https://spdx.dev/specifications/
An SPDX file can be validated with an online tool provided by the SPDX organization (https://tools.spdx.org/app/validate/)
The SPDX specification was recently ratified as an ISO standard (ISO/IEC 5962:2021)

Justification for this PR

Today Julia is the underdog and anything that reduces barriers to adoption is a benefit to the project. The review of open source software by organizations, such as described above, is one such barrier. This type of review is commonly required for every point release that the organization wishes to use.

Julia is a large and quite complex project pulling code from 23 external repositories as part of the build (see THIRDPARTY.md for a full list). Julia also maintains 8 of the standard libraries in separate repositories.

When you describe Julia like that it's quite clear that performing a proper review of Julia takes considerable effort. They have to go into each of the external repositories, check the license, see if anything important has changed since the last review, write this all down and then figure out if Julia is acceptable to their organization.

Given the amount of work involved you can understand why some organizations might give up on approving Julia on a regular basis and say that other more widely used tools such as Matlab, Python, and R are "good enough" for their needs.

The file THIRDPARTY.md does contain much of the information that the reviewer will want, but it is in an ad-hoc format that is awkward for a human reviewer unfamiliar with Julia and is not suitable for automated processing.

Adding an SPDX document to Julia has the potential to streamline the review process leading to faster, more frequent approval of Julia versions in organizations. Basically this document does a lot of the grunt work for the reviewer. It provides the reviewer with a standardized, human-readable, machine-scannable list of all of the external software components (packages in SPDX parlance) used in Julia, what their licenses are, where they are downloaded from and how they are used in Julia (relationships in SPDX parlance). Additional information and notes for each package can be added as needed. It also makes it easy for a reviewer to compare the current SPDX file with the file from the previously approved version of Julia and see if anything important has changed.

I certainly don't see how it could hurt the project to have the file present.

Tracking Version Information

The SPDX package for Julia itself includes the version number.

I have deliberately not included version info for all the software packages Julia pulls in from other repositories. I don't see how that information could be reliably kept up-to-date by hand and thus it is best not to include it at all. Instead I have added informational text in the field sourceInfo pointing the reader to the appropriate makefile where the version information is contained. This way even when package versions are updated the SPDX file does not need to be changed.

There are ways to incorporate the SPDX file into the build process so that we can guarantee the accuracy of the document, including version number, but that's best left for another PR

Updating the SPDX Document for each release

An SPDX file supports several file formats. I have chosen to use JSON because tools are widely available for this format, including for Julia, making it easy to update the fields.

Each time a new version of Julia is released just a few fields will need to be updated in this file, assuming no changes to the external software packages that need to be captured.

  • documentNamespace: A new UUID to be incorporated into the string. This field provides a unique identifier for this version of the SPDX file.
  • creationInfo.created: Note that the SPDX specification is very specific about the date_time format to be used.
  • packages[1].versionInfo : The Julia version number. Update with each new release
  • packages[1].downloadLocation: Update to point to the correct version of the Julia source code
  • packages[1].copyrightText: Update if needed

Status at initial posting

To kick off this discussion, I have included only a few packages to demonstrate how an SPDX file works. The packages were chosen to show all the different types of packages and what is similar and different about their description in SPDX. These packages are

  • Julia: A description of the main repository
  • Pkg: Part of the standard library
  • OpenBLAS: An external dependency
  • libuv: An external dependency that is forked and maintained by the Julia project
  • zlib: A library distributed with Julia. I'm not sure what this means, but I'm guessing that it is used only by the installer, hence the DISTRIBUTION_ARTIFACT relationship. Can someone confirm if that is correct?
  • Patchelf: A build tool

This file has been validated by the SPDX group's online tool as conforming to the specification.

Next steps

Putting together a complete SPDX file is a lot of work. I would prefer to not continue that work until a consensus has been reached that the document presented here is acceptable and that approval of this PR would be likely when the work and reviews are complete.

Copy link

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting a change to the license for patchelf.

julia.spdx.json Outdated Show resolved Hide resolved
Copy link

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SamuraiAku
Copy link
Contributor Author

That's all the stdlibs added.

…that the LLVM license changed from UIUC to Apache back in v8. Corrected the license reference in THIRDPARTY.md which pointed to the v6 license and is no longer correct.
@SamuraiAku
Copy link
Contributor Author

Added the rest of the Base and Core external dependencies.

@SamuraiAku
Copy link
Contributor Author

@goneall can you have a look at my package entry for libgit2? This package has a special exception to the GPL and I had to create my own license info field for it. Does that look right to you? And do you know if there is an exception in the license list I should be using instead? I didn't see anything that looked right, but I it's good to double-check. Thanks.

@goneall
Copy link

goneall commented Oct 7, 2021

@SamuraiAku The current SPDX spec license expression documented in Annex D only allows standard exception ID's. LicenseRef style ID's are not allowed.

The way to handle this situation is to create a LicenseRef for then entire license including the exception. In this case, it would be something like LicenseRef-GPL2-only-with-libgit2-exception. The license text for this license ref would be the either the text as found in the library or GPL-2.0 license texted concatenated with the exception text.

Note that there are some discussions in progress to allow the use of LicenseRefs for exception in the SPDX 3.0 spec.

@SamuraiAku
Copy link
Contributor Author

Changed my mind on SuiteSparse. Julia uses too many modules to make it worthwhile to make each one a package in SPDX. The copyright text field is very large, but the licensing fields make sense I think.

@SamuraiAku
Copy link
Contributor Author

I think that's everything. The SPDX file describes every external dependency listed in THIRDPARTY.md, plus all the external stdlibs. Next step is to review all the entries and add a script under contrib for updating the document for each release.

@ViralBShah I have a question about Zlib and 7-Zip. They are listed separately in THIRDPARTY.md where it says "Julia bundles the following external programs and libraries". I presume that means that they are used by Julia in a different manner than the other dependencies. I wasn't sure what that was, so I guessed that these two are RUNTIME_DEPENDENCY_OF Julia while all the other non-build tool dependencies are BUILD_DEPENDENCY_OF Julia. Can you comment?

@ViralBShah
Copy link
Member

I believe zlib is a build dependency and 7-zip is a runtime dependency.

@SamuraiAku
Copy link
Contributor Author

For review, I've read back the SPDX file in this PR and created a table of the most important information fields with the following code

using JSON  
using TypedTables  
using PrettyTables  
pkgNames= Vector{String}()
pkgLicenses= Vector{String}()
pkgHomePages= Vector{String}()
pkgDownloads= Vector{String}()
pkgRelationshipsToJulia= Vector{String}()
spdxData= JSON.parsefile("./julia.spdx.json")
for pkg in spdxData["packages"]
    if pkg["name"] != "Julia"
        push!(pkgNames, pkg["name"])
        push!(pkgLicenses, pkg["licenseConcluded"])
        push!(pkgHomePages, pkg["homepage"])
        push!(pkgDownloads, pkg["downloadLocation"])
        for rel in spdxData["relationships"]
            if rel["spdxElementId"] == pkg["SPDXID"]
                push!(pkgRelationshipsToJulia, rel["relationshipType"])
            end
        end
    end
end
reviewTable= Table(Name= pkgNames, License= pkgLicenses, DepType= pkgRelationshipsToJulia, HomePage= pkgHomePages, Dowload= pkgDownloads)
open("SPDXreviewTable.md", "w") do f
    pretty_table(f, reviewTable; tf= tf_markdown)
end
Name License DepType HomePage Dowload
Pkg.jl MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaLang/Pkg.jl.git
Statistics.jl MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaLang/Statistics.jl.git
libCURL.jl MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaWeb/LibCURL.jl.git
Downloads.jl MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaLang/Downloads.jl.git
ArgTools.jl MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaIO/ArgTools.jl.git
Tar.jl MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaIO/Tar.jl.git
NetworkOptions.jl MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaLang/NetworkOptions.jl.git
SuiteSparse.jl MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaLang/SuiteSparse.jl.git
SHA.jl MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaCrypto/SHA.jl.git
dSFMT BSD-3-Clause BUILD_DEPENDENCY_OF https://github.com/MersenneTwister-Lab/dSFMT git+https://github.com/MersenneTwister-Lab/dSFMT.git
OpenLibm MIT BUILD_DEPENDENCY_OF https://julialang.org git+https://github.com/JuliaMath/openlibm.git
GMP LGPL-3.0-or-later BUILD_DEPENDENCY_OF https://gmplib.org/ https://gmplib.org/download/gmp/
libgit2 LicenseRef-GPL-2.0-only-with-libgit2-exception BUILD_DEPENDENCY_OF https://libgit2.org git+https://github.com/libgit2/libgit2.git
curl curl BUILD_DEPENDENCY_OF https://curl.se git+https://github.com/curl/curl.git
libssh2 BSD-3-Clause BUILD_DEPENDENCY_OF https://www.libssh2.org git+https://github.com/libssh2/libssh2.git
mbedtls Apache-2.0 BUILD_DEPENDENCY_OF https://tls.mbed.org git+https://github.com/ARMmbed/mbedtls.git
mpfr LGPL-3.0-or-later BUILD_DEPENDENCY_OF https://www.mpfr.org/ https://www.mpfr.org/
OpenBLAS BSD-3-Clause BUILD_DEPENDENCY_OF https://www.openblas.net git+https://github.com/xianyi/OpenBLAS.git
LAPACK BSD-3-Clause BUILD_DEPENDENCY_OF https://netlib.org/ https://www.netlib.org/lapack/
PCRE BSD-3-Clause BUILD_DEPENDENCY_OF https://www.pcre.org https://ftp.pcre.org/pub/pcre/
LibSuiteSparse GPL-2.0-or-later BUILD_DEPENDENCY_OF https://people.engr.tamu.edu/davis/suitesparse.html git+https://github.com/DrTimothyAldenDavis/SuiteSparse.git
LibBlasTrampoline MIT BUILD_DEPENDENCY_OF https://github.com/JuliaLinearAlgebra git+https://github.com/JuliaLinearAlgebra/libblastrampoline.git
NGHTTP2 MIT BUILD_DEPENDENCY_OF https://nghttp2.org git+https://github.com/nghttp2/nghttp2.git
libunwind MIT BUILD_DEPENDENCY_OF http://www.nongnu.org/libunwind/ git+https://github.com/libunwind/libunwind.git
libuv MIT BUILD_DEPENDENCY_OF https://libuv.org git+https://github.com/JuliaLang/libuv.git
llvm Apache-2.0 WITH LLVM-exception BUILD_DEPENDENCY_OF https://llvm.org git+https://github.com/JuliaLang/llvm-project.git
utf8proc MIT BUILD_DEPENDENCY_OF https://github.com/JuliaStrings/utf8proc git+https://github.com/JuliaLang/utf8proc.git
7-Zip LGPL-3.0-or-later RUNTIME_DEPENDENCY_OF https://www.7-zip.org https://sourceforge.net/projects/p7zip/files/p7zip
zlib Zlib BUILD_DEPENDENCY_OF https://zlib.net git+https://github.com/madler/zlib.git
patchelf GPL-3.0-or-later BUILD_TOOL_OF https://nixos.org/patchelf.html git+https://github.com/NixOS/patchelf.git
objconv GPL-3.0-or-later BUILD_TOOL_OF https://www.agner.org/optimize/#objconv https://www.agner.org/optimize/objconv.zip
libwhich MIT BUILD_TOOL_OF https://github.com/vtjnash/libwhich git+https://github.com/vtjnash/libwhich.git

@ViralBShah
Copy link
Member

Looks great. Let's have a few more eyeballs on this.

@SamuraiAku SamuraiAku changed the title RFC: Add an SPDX file to the repository to streamline license and security reviews by user organizations Add an SPDX file to the repository to streamline license and security reviews by user organizations Dec 21, 2021
@SamuraiAku SamuraiAku marked this pull request as ready for review December 21, 2021 07:25
@SamuraiAku
Copy link
Contributor Author

Added a script in contrib that allows for easy updating of the SPDX document with each release.

I think this is ready to merge.

@ViralBShah ViralBShah merged commit 7cd1da3 into JuliaLang:master Dec 21, 2021
@ViralBShah
Copy link
Member

Addresses #35042 partially.

@ViralBShah
Copy link
Member

I suggest moving the julia.spdx.json file to contrib. Any objection to that?

@SamuraiAku
Copy link
Contributor Author

I would keep it at the top level. It's equivalent to THIRDPARTY.md, in a standardized format. Automated scanners may or may not find it if we move it down.

On a related note, is there anything we should do to make sure that the contrib/updateSPDX.jl script is run with every release? It makes the most sense to me to run the script in the same commit that updates VERSION.

@vtjnash
Copy link
Member

vtjnash commented Jan 7, 2022

Add it to the release process checklist (release-candidate) in the Makefile. Though it looks like @KristofferC or @staticfloat should update parts of those directions, since we don't upload the files manually anymore.

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
… reviews by user organizations (JuliaLang#42102)

* Add an SPDX file to the repository. 
* New script contrib/updateSPDX.jl .  Ran the script to update the SPDX file.
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
… reviews by user organizations (JuliaLang#42102)

* Add an SPDX file to the repository. 
* New script contrib/updateSPDX.jl .  Ran the script to update the SPDX file.
@nalimilan
Copy link
Member

THIRDPARTY.md lists a few source files which are part of Julia but use licenses other than MIT. Shouldn't they be listed in the SPDX file? Otherwise it's misleading to say that Julia itself is MIT. I'd even argue that information about these files should be moved from THIRDPARTY.md to LICENSE.md, as they are not third-party libraries at all.

@SamuraiAku
Copy link
Contributor Author

Adding the files and code snippets to the SPDX file should be done, but doing it properly seemed too hard for a first attempt. With the external dependencies it’s fairly easy to browse the deps directory and see what changed and update accordingly. Additional licenses in the source code have to be tracked manually which makes any updates or deletions more painful. But it can be done.

On overall licensing of Julia, unless it’s discovered that some GPL has been included in the Julia source, which I don’t think has been done, then all the Julia source would I think be considered MIT since it’s more permissive that the other licenses present and full credit is given. But the Julia binary is GPL since some of the external dependencies are GPL. Not an issue since all code is published. Only if someone wanted to roll proprietary improvements into Julia source and redistribute would that be an issue.

@nalimilan
Copy link
Member

nalimilan commented Aug 26, 2022

Unfortunately, the problem is precisely that MIT is more permissive than licenses used by at least some of the source files listed in THIRDPARTY.md. For example, the Zlib license adds the restriction that "The origin of this software must not be misrepresented" (it's not completely clear what this implies...). The LLVM licence clearly requires that "Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimers.". But the MIT license doesn't require that, so if Julia includes that file and yet claims that the whole codebase is MIT, it doesn't give the correct information for people who want to redistribute it. The fact that we have an SPDX file that doesn't mention these licences is dangerous for them.

EDIT: BTW, I think we are also not respecting these licences when building Julia tarballs, as they don't include a copy of the text of these licenses (only an hyperlink in THIRDPARTY.md), contrary to what they require. We should probably copy them to a directory.

@ViralBShah
Copy link
Member

I think we need to open this as a separate issue. Agree on adding the text of the licenses if that is required by those licenses, and adding to the SPDX file. I believe the MIT license also requires it, since it has the clause: "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software." Of course we do include the MIT license.

@StefanKarpinski
Copy link
Member

What needs to be done here? This issue is closed but the last few comments imply that some issue remains.

@ViralBShah
Copy link
Member

Milan pointed out some things above that are part of Julia source and not third party.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants