Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency Tree Exclusions for RDF/Tag parsing #145

Closed
stevespringett opened this issue Jan 10, 2018 · 4 comments
Closed

Dependency Tree Exclusions for RDF/Tag parsing #145

stevespringett opened this issue Jan 10, 2018 · 4 comments
Assignees
Milestone

Comments

@stevespringett
Copy link

As an observation, the dependency tree for v2.1.7 looks like:

+- org.spdx:spdx-tools:jar:2.1.7:compile
|  +- org.apache.jena:apache-jena-libs:pom:3.1.1:compile
|  |  \- org.apache.jena:jena-tdb:jar:3.1.1:compile
|  |     \- org.apache.jena:jena-arq:jar:3.1.1:compile
|  |        +- org.apache.jena:jena-core:jar:3.1.1:compile
|  |        |  \- org.apache.jena:jena-base:jar:3.1.1:compile
|  |        |     \- com.github.andrewoma.dexx:collection:jar:0.6:compile
|  |        +- org.apache.jena:jena-shaded-guava:jar:3.1.1:compile
|  |        +- com.github.jsonld-java:jsonld-java:jar:0.8.3:compile
|  |        +- org.apache.httpcomponents:httpclient-cache:jar:4.5.2:compile
|  |        +- org.apache.thrift:libthrift:jar:0.9.3:compile
|  |        \- org.apache.commons:commons-csv:jar:1.3:compile
|  +- xerces:xercesImpl:jar:2.11.0.SP5:compile
|  +- org.apache.jena:jena-iri:jar:3.1.1:compile
|  +- com.yevster.net.rootdev:java-rdfa:jar:0.4.3:compile
|  |  \- net.rootdev:java-rdfa-htmlparser:jar:0.4.2-RC2:compile
|  +- xml-apis:xml-apis:jar:1.4.01:compile
|  +- org.antlr:antlr:jar:3.4:compile
|  |  +- org.antlr:antlr-runtime:jar:3.4:compile
|  |  |  +- org.antlr:stringtemplate:jar:3.2.1:compile
|  |  |  \- antlr:antlr:jar:2.7.7:compile
|  |  \- org.antlr:ST4:jar:4.0.4:compile
|  +- org.apache.poi:poi:jar:3.15:compile
|  +- org.apache.poi:poi-ooxml:jar:3.15:compile
|  |  +- org.apache.poi:poi-ooxml-schemas:jar:3.15:compile
|  |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.6.0:compile
|  |  |     \- stax:stax-api:jar:1.0.1:compile
|  |  \- com.github.virtuald:curvesapi:jar:1.04:compile
|  +- net.sf.opencsv:opencsv:jar:2.3:compile
|  +- nu.validator.htmlparser:htmlparser:jar:1.4:compile
|  +- net.sf.saxon:saxon:jar:8.7:compile
|  +- com.google.guava:guava:jar:16.0.1:compile
|  +- com.github.spullara.mustache.java:compiler:jar:0.7.9:compile
|  +- org.slf4j:slf4j-log4j12:jar:1.7.2:compile
|  +- log4j:log4j:jar:1.2.13:compile
|  +- com.googlecode.json-simple:json-simple:jar:1.1.1:compile
|  +- org.eclipse.jgit:org.eclipse.jgit:jar:4.7.1.201706071930-r:compile
|  |  +- com.jcraft:jsch:jar:0.1.54:compile
|  |  \- com.googlecode.javaewah:JavaEWAH:jar:1.1.6:compile
|  \- net.sf.saxon:saxon-dom:jar:8.7:compile

When attempting to use SPDX tools simply as a way to parse SPDX Tag and RDF documents, there are many dependencies included in the parent project that are never used.

I've been attempting to omit them from my project, as many of them are old or conflict with other dependencies in my project. The POM excerpt reads:

<dependency>
    <groupId>org.spdx</groupId>
    <artifactId>spdx-tools</artifactId>
    <version>2.7.1</version>
    <exclusions>
        <exclusion>
            <groupId>xml-apis</groupId>
            <artifactId>xml-apis</artifactId>
        </exclusion>
        <exclusion>
            <groupId>net.sf.opencsv</groupId>
            <artifactId>opencsv</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.github.spullara.mustache.java</groupId>
            <artifactId>compiler</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.eclipse.jgit</groupId>
            <artifactId>org.eclipse.jgit</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.jcraft:jsch</groupId>
            <artifactId>jsch</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.googlecode.javaewah</groupId>
            <artifactId>JavaEWAH</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.thrift</groupId>
            <artifactId>libthrift</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-csv</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.github.jsonld-java</groupId>
            <artifactId>jsonld-java</artifactId>
        </exclusion>
    </exclusions>
</dependency>

With this configuration, I'm able to properly parse RDF and Tag 2.0 and 2.1 examples in this repo.

I don't know if this config will cause issues if other (potentially more complex) RDF or Tag documents are parsed. Thoughts?

Also, it would really be nice to have the exact exclusions documented somewhere.

@goneall
Copy link
Member

goneall commented Jan 10, 2018

opencsv, mustache, and jgit (which adds the dependencies on jsch and javaEWAH) are used by the LicenseRDFaGenerator which is a tool that generates the license metadata for the website spdx.org/licenses. There is also a tool that converts SPDX files to an HTML format which uses Mustache.

I tried removing the XML API's and the only compile time issue was with the LicenseXmlDocument which is only used by the LicenseRDFaGenerator.

Some of your exclusions relate to Jena which is used to manage the RDF representation. My guess is that the exclusions you are using would only affect certain formats which are not currently used by any of the SPDX tools (e.g. JSON-LD).

The one exclusion I'm not sure about is libthrift. That is used by Jena - for which purpose I am not sure.

I have been thinking about refactoring the SPDX tools into 2 separate repositories - one containing the library and one with separate tools.

Based on the information collected above on the dependencies, it may be worthwhile splitting the LicenseRDFaGenerator into a separate repo. As far as I know, this tool is only used by the SPDX legal team.

@goneall
Copy link
Member

goneall commented Jan 28, 2018

Update - I'm working on de-tangling the LicenseRDFaGenerator from the rest of the library and I was able to remove jgit and xml-apis.

It turns out opencsv is used by some HTML tools (which should not impact the license conversion) and openCSV is used by the spreadsheet tools (again, should not impact the license conversion).

@goneall goneall self-assigned this Apr 9, 2018
@goneall goneall added this to the 2.1.12 milestone Apr 9, 2018
goneall added a commit that referenced this issue Apr 12, 2018
…DFa. Fixes issue #90, issue #146 and issue #145

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
goneall added a commit that referenced this issue Apr 14, 2018
…DFa (#158)

* Read standard licenses in JSON-LD format and remove dependencies on RDFa.  Fixes issue #90, issue #146 and issue #145

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

* Update URL for listed licenses to the released license list

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

* Fix unit test failures - cached listed license was being modified.  Resolved by cloning the returned license from get license by ID

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

* Update the path the license list to the released license list files

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
@goneall
Copy link
Member

goneall commented Apr 14, 2018

@stevespringett - I just removed the dependency on the RDFa library in version 2.1.12. Does this resolve this issue or is there more we could do?

@stevespringett
Copy link
Author

Big thanks @goneall. The removal of the unnecessary dependencies and generation code is greatly appreciated.

As of 2.1.12, the dependency tree now looks like:

+- org.spdx:spdx-tools:jar:2.1.12:compile
|  +- org.apache.jena:apache-jena-libs:pom:3.1.1:compile
|  |  \- org.apache.jena:jena-tdb:jar:3.1.1:compile
|  |     \- org.apache.jena:jena-arq:jar:3.1.1:compile
|  |        +- org.apache.jena:jena-core:jar:3.1.1:compile
|  |        |  \- org.apache.jena:jena-base:jar:3.1.1:compile
|  |        |     \- com.github.andrewoma.dexx:collection:jar:0.6:compile
|  |        +- org.apache.jena:jena-shaded-guava:jar:3.1.1:compile
|  |        +- com.github.jsonld-java:jsonld-java:jar:0.8.3:compile
|  |        +- org.apache.httpcomponents:httpclient-cache:jar:4.5.2:compile
|  |        +- org.apache.thrift:libthrift:jar:0.9.3:compile
|  |        \- org.apache.commons:commons-csv:jar:1.3:compile
|  +- xerces:xercesImpl:jar:2.11.0.SP5:compile
|  +- org.apache.jena:jena-iri:jar:3.1.1:compile
|  +- org.antlr:antlr:jar:3.4:compile
|  |  +- org.antlr:antlr-runtime:jar:3.4:compile
|  |  |  +- org.antlr:stringtemplate:jar:3.2.1:compile
|  |  |  \- antlr:antlr:jar:2.7.7:compile
|  |  \- org.antlr:ST4:jar:4.0.4:compile
|  +- org.apache.poi:poi:jar:3.15:compile
|  +- org.apache.poi:poi-ooxml:jar:3.15:compile
|  |  +- org.apache.poi:poi-ooxml-schemas:jar:3.15:compile
|  |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.6.0:compile
|  |  |     \- stax:stax-api:jar:1.0.1:compile
|  |  \- com.github.virtuald:curvesapi:jar:1.04:compile
|  +- net.sf.opencsv:opencsv:jar:2.3:compile
|  +- nu.validator.htmlparser:htmlparser:jar:1.4:compile
|  +- net.sf.saxon:saxon:jar:8.7:compile
|  +- com.google.guava:guava:jar:16.0.1:compile
|  +- com.github.spullara.mustache.java:compiler:jar:0.7.9:compile
|  +- org.apache.logging.log4j:log4j-api:jar:2.10.0:compile
|  +- org.apache.logging.log4j:log4j-core:jar:2.10.0:compile
|  +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.10.0:compile
|  +- com.googlecode.json-simple:json-simple:jar:1.1.1:compile
|  \- net.sf.saxon:saxon-dom:jar:8.7:compile

BTW, OWASP Dependency-Track incorporates this library (2.1.7 in the current release and 2.1.12 in the current development branch) for its SPDX support.

Closing issue.

stevespringett added a commit to DependencyTrack/dependency-track that referenced this issue Apr 15, 2018
goneall added a commit that referenced this issue Apr 26, 2020
…DFa (#158)

* Read standard licenses in JSON-LD format and remove dependencies on RDFa.  Fixes issue #90, issue #146 and issue #145

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

* Update URL for listed licenses to the released license list

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

* Fix unit test failures - cached listed license was being modified.  Resolved by cloning the returned license from get license by ID

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

* Update the path the license list to the released license list files

Signed-off-by: Gary O'Neall <gary@sourceauditor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants