Dependency Tree Exclusions for RDF/Tag parsing #145

stevespringett · 2018-01-10T05:49:05Z

As an observation, the dependency tree for v2.1.7 looks like:

+- org.spdx:spdx-tools:jar:2.1.7:compile
|  +- org.apache.jena:apache-jena-libs:pom:3.1.1:compile
|  |  \- org.apache.jena:jena-tdb:jar:3.1.1:compile
|  |     \- org.apache.jena:jena-arq:jar:3.1.1:compile
|  |        +- org.apache.jena:jena-core:jar:3.1.1:compile
|  |        |  \- org.apache.jena:jena-base:jar:3.1.1:compile
|  |        |     \- com.github.andrewoma.dexx:collection:jar:0.6:compile
|  |        +- org.apache.jena:jena-shaded-guava:jar:3.1.1:compile
|  |        +- com.github.jsonld-java:jsonld-java:jar:0.8.3:compile
|  |        +- org.apache.httpcomponents:httpclient-cache:jar:4.5.2:compile
|  |        +- org.apache.thrift:libthrift:jar:0.9.3:compile
|  |        \- org.apache.commons:commons-csv:jar:1.3:compile
|  +- xerces:xercesImpl:jar:2.11.0.SP5:compile
|  +- org.apache.jena:jena-iri:jar:3.1.1:compile
|  +- com.yevster.net.rootdev:java-rdfa:jar:0.4.3:compile
|  |  \- net.rootdev:java-rdfa-htmlparser:jar:0.4.2-RC2:compile
|  +- xml-apis:xml-apis:jar:1.4.01:compile
|  +- org.antlr:antlr:jar:3.4:compile
|  |  +- org.antlr:antlr-runtime:jar:3.4:compile
|  |  |  +- org.antlr:stringtemplate:jar:3.2.1:compile
|  |  |  \- antlr:antlr:jar:2.7.7:compile
|  |  \- org.antlr:ST4:jar:4.0.4:compile
|  +- org.apache.poi:poi:jar:3.15:compile
|  +- org.apache.poi:poi-ooxml:jar:3.15:compile
|  |  +- org.apache.poi:poi-ooxml-schemas:jar:3.15:compile
|  |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.6.0:compile
|  |  |     \- stax:stax-api:jar:1.0.1:compile
|  |  \- com.github.virtuald:curvesapi:jar:1.04:compile
|  +- net.sf.opencsv:opencsv:jar:2.3:compile
|  +- nu.validator.htmlparser:htmlparser:jar:1.4:compile
|  +- net.sf.saxon:saxon:jar:8.7:compile
|  +- com.google.guava:guava:jar:16.0.1:compile
|  +- com.github.spullara.mustache.java:compiler:jar:0.7.9:compile
|  +- org.slf4j:slf4j-log4j12:jar:1.7.2:compile
|  +- log4j:log4j:jar:1.2.13:compile
|  +- com.googlecode.json-simple:json-simple:jar:1.1.1:compile
|  +- org.eclipse.jgit:org.eclipse.jgit:jar:4.7.1.201706071930-r:compile
|  |  +- com.jcraft:jsch:jar:0.1.54:compile
|  |  \- com.googlecode.javaewah:JavaEWAH:jar:1.1.6:compile
|  \- net.sf.saxon:saxon-dom:jar:8.7:compile

When attempting to use SPDX tools simply as a way to parse SPDX Tag and RDF documents, there are many dependencies included in the parent project that are never used.

I've been attempting to omit them from my project, as many of them are old or conflict with other dependencies in my project. The POM excerpt reads:

<dependency>
    <groupId>org.spdx</groupId>
    <artifactId>spdx-tools</artifactId>
    <version>2.7.1</version>
    <exclusions>
        <exclusion>
            <groupId>xml-apis</groupId>
            <artifactId>xml-apis</artifactId>
        </exclusion>
        <exclusion>
            <groupId>net.sf.opencsv</groupId>
            <artifactId>opencsv</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.github.spullara.mustache.java</groupId>
            <artifactId>compiler</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.eclipse.jgit</groupId>
            <artifactId>org.eclipse.jgit</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.jcraft:jsch</groupId>
            <artifactId>jsch</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.googlecode.javaewah</groupId>
            <artifactId>JavaEWAH</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.thrift</groupId>
            <artifactId>libthrift</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-csv</artifactId>
        </exclusion>
        <exclusion>
            <groupId>com.github.jsonld-java</groupId>
            <artifactId>jsonld-java</artifactId>
        </exclusion>
    </exclusions>
</dependency>

With this configuration, I'm able to properly parse RDF and Tag 2.0 and 2.1 examples in this repo.

I don't know if this config will cause issues if other (potentially more complex) RDF or Tag documents are parsed. Thoughts?

Also, it would really be nice to have the exact exclusions documented somewhere.

The text was updated successfully, but these errors were encountered:

goneall · 2018-01-10T17:55:24Z

opencsv, mustache, and jgit (which adds the dependencies on jsch and javaEWAH) are used by the LicenseRDFaGenerator which is a tool that generates the license metadata for the website spdx.org/licenses. There is also a tool that converts SPDX files to an HTML format which uses Mustache.

I tried removing the XML API's and the only compile time issue was with the LicenseXmlDocument which is only used by the LicenseRDFaGenerator.

Some of your exclusions relate to Jena which is used to manage the RDF representation. My guess is that the exclusions you are using would only affect certain formats which are not currently used by any of the SPDX tools (e.g. JSON-LD).

The one exclusion I'm not sure about is libthrift. That is used by Jena - for which purpose I am not sure.

I have been thinking about refactoring the SPDX tools into 2 separate repositories - one containing the library and one with separate tools.

Based on the information collected above on the dependencies, it may be worthwhile splitting the LicenseRDFaGenerator into a separate repo. As far as I know, this tool is only used by the SPDX legal team.

goneall · 2018-01-28T20:24:21Z

Update - I'm working on de-tangling the LicenseRDFaGenerator from the rest of the library and I was able to remove jgit and xml-apis.

It turns out opencsv is used by some HTML tools (which should not impact the license conversion) and openCSV is used by the spreadsheet tools (again, should not impact the license conversion).

…DFa. Fixes issue #90, issue #146 and issue #145 Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

…DFa (#158) * Read standard licenses in JSON-LD format and remove dependencies on RDFa. Fixes issue #90, issue #146 and issue #145 Signed-off-by: Gary O'Neall <gary@sourceauditor.com> * Update URL for listed licenses to the released license list Signed-off-by: Gary O'Neall <gary@sourceauditor.com> * Fix unit test failures - cached listed license was being modified. Resolved by cloning the returned license from get license by ID Signed-off-by: Gary O'Neall <gary@sourceauditor.com> * Update the path the license list to the released license list files Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

goneall · 2018-04-14T16:50:22Z

@stevespringett - I just removed the dependency on the RDFa library in version 2.1.12. Does this resolve this issue or is there more we could do?

stevespringett · 2018-04-15T04:07:23Z

Big thanks @goneall. The removal of the unnecessary dependencies and generation code is greatly appreciated.

As of 2.1.12, the dependency tree now looks like:

+- org.spdx:spdx-tools:jar:2.1.12:compile
|  +- org.apache.jena:apache-jena-libs:pom:3.1.1:compile
|  |  \- org.apache.jena:jena-tdb:jar:3.1.1:compile
|  |     \- org.apache.jena:jena-arq:jar:3.1.1:compile
|  |        +- org.apache.jena:jena-core:jar:3.1.1:compile
|  |        |  \- org.apache.jena:jena-base:jar:3.1.1:compile
|  |        |     \- com.github.andrewoma.dexx:collection:jar:0.6:compile
|  |        +- org.apache.jena:jena-shaded-guava:jar:3.1.1:compile
|  |        +- com.github.jsonld-java:jsonld-java:jar:0.8.3:compile
|  |        +- org.apache.httpcomponents:httpclient-cache:jar:4.5.2:compile
|  |        +- org.apache.thrift:libthrift:jar:0.9.3:compile
|  |        \- org.apache.commons:commons-csv:jar:1.3:compile
|  +- xerces:xercesImpl:jar:2.11.0.SP5:compile
|  +- org.apache.jena:jena-iri:jar:3.1.1:compile
|  +- org.antlr:antlr:jar:3.4:compile
|  |  +- org.antlr:antlr-runtime:jar:3.4:compile
|  |  |  +- org.antlr:stringtemplate:jar:3.2.1:compile
|  |  |  \- antlr:antlr:jar:2.7.7:compile
|  |  \- org.antlr:ST4:jar:4.0.4:compile
|  +- org.apache.poi:poi:jar:3.15:compile
|  +- org.apache.poi:poi-ooxml:jar:3.15:compile
|  |  +- org.apache.poi:poi-ooxml-schemas:jar:3.15:compile
|  |  |  \- org.apache.xmlbeans:xmlbeans:jar:2.6.0:compile
|  |  |     \- stax:stax-api:jar:1.0.1:compile
|  |  \- com.github.virtuald:curvesapi:jar:1.04:compile
|  +- net.sf.opencsv:opencsv:jar:2.3:compile
|  +- nu.validator.htmlparser:htmlparser:jar:1.4:compile
|  +- net.sf.saxon:saxon:jar:8.7:compile
|  +- com.google.guava:guava:jar:16.0.1:compile
|  +- com.github.spullara.mustache.java:compiler:jar:0.7.9:compile
|  +- org.apache.logging.log4j:log4j-api:jar:2.10.0:compile
|  +- org.apache.logging.log4j:log4j-core:jar:2.10.0:compile
|  +- org.apache.logging.log4j:log4j-slf4j-impl:jar:2.10.0:compile
|  +- com.googlecode.json-simple:json-simple:jar:1.1.1:compile
|  \- net.sf.saxon:saxon-dom:jar:8.7:compile

BTW, OWASP Dependency-Track incorporates this library (2.1.7 in the current release and 2.1.12 in the current development branch) for its SPDX support.

Closing issue.

…pdx/tools#145

…DFa (#158) * Read standard licenses in JSON-LD format and remove dependencies on RDFa. Fixes issue #90, issue #146 and issue #145 Signed-off-by: Gary O'Neall <gary@sourceauditor.com> * Update URL for listed licenses to the released license list Signed-off-by: Gary O'Neall <gary@sourceauditor.com> * Fix unit test failures - cached listed license was being modified. Resolved by cloning the returned license from get license by ID Signed-off-by: Gary O'Neall <gary@sourceauditor.com> * Update the path the license list to the released license list files Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

This was referenced Jan 10, 2018

Provide a non-RDFa RDF Readable format for license files on spdx.org/licenses #146

Closed

Proposal: Move LicenseRdfaGenerator to a separate github repository #147

Closed

goneall self-assigned this Apr 9, 2018

goneall added this to the 2.1.12 milestone Apr 9, 2018

goneall added a commit that referenced this issue Apr 12, 2018

Read standard licenses in JSON-LD format and remove dependencies on R…

680b369

…DFa. Fixes issue #90, issue #146 and issue #145 Signed-off-by: Gary O'Neall <gary@sourceauditor.com>

goneall mentioned this issue Apr 12, 2018

Read standard licenses in JSON-LD format and remove dependencies on RDFa #158

Merged

stevespringett closed this as completed Apr 15, 2018

stevespringett added a commit to DependencyTrack/dependency-track that referenced this issue Apr 15, 2018

Removed SPDX dependency exclusions as they are no longer necessary. s…

dd7c5fb

…pdx/tools#145

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependency Tree Exclusions for RDF/Tag parsing #145

Dependency Tree Exclusions for RDF/Tag parsing #145

stevespringett commented Jan 10, 2018

goneall commented Jan 10, 2018

goneall commented Jan 28, 2018

goneall commented Apr 14, 2018

stevespringett commented Apr 15, 2018

Dependency Tree Exclusions for RDF/Tag parsing #145

Dependency Tree Exclusions for RDF/Tag parsing #145

Comments

stevespringett commented Jan 10, 2018

goneall commented Jan 10, 2018

goneall commented Jan 28, 2018

goneall commented Apr 14, 2018

stevespringett commented Apr 15, 2018