Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a skylark repository rule for maven artifacts #1410

Closed
kchodorow opened this issue Jun 15, 2016 · 48 comments
Closed

Create a skylark repository rule for maven artifacts #1410

kchodorow opened this issue Jun 15, 2016 · 48 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) type: feature request

Comments

@kchodorow
Copy link
Contributor

Some FRs that have come up in the past:

  • Support dependency management.
  • Download src and doc jars.
  • Allow cloning from local repositories.
  • Transitive deps?
@kchodorow kchodorow added type: feature request P2 We'll consider working on this in future. (Assignee optional) category: extensibility > external repositories labels Jun 15, 2016
@kchodorow kchodorow added this to the 0.5 milestone Jun 15, 2016
@kchodorow
Copy link
Contributor Author

@jart
Copy link
Contributor

jart commented Sep 6, 2016

Added a proposal sorta related to this in #1733. Feel free to close it out and schlep it into this issue.

jin added a commit to jin/bazel that referenced this issue Sep 8, 2016
This is an initial implementation of the maven_jar rule in Skylark,
targeted at the FRs in issue bazelbuild#1410.

Attributes `name`, `artifact`, `repository`, `sha1` have been
implemented, but not `server`.

This implementation uses `wget` as the underlying fetch mechanism for remote
artifacts to simplify dependencies. My original implementation made use
of an underlying call to the `mvn` binary and its `dependency:get`
plugin, but it brought along complexities with parsing `pom.xml` files
and creating local repositories within Bazel's cache. My personal
opinion here is that it's easier to build up from `wget` than to trim
down complexities when using `mvn`.

With regards to server, there are some limitations with retrieving a
maven_server's attribute at Loading Phase without the use of hacky macros
(issue bazelbuild#1704), and even if macros are used, the maven_server is not treated
as an actual dependency by maven_jar, hence it will not get analyzed during
Analysis Phase. There is a test (`test_unimplemented_server_attr`) to
ensure that the error message to shown to users if they use the server
attribute with this rule.

I will have to put more work into implementing maven_server
appropriately, and possibly proposing an API review of maven_jar in
another change set.

Change-Id: I64166dd251d5d268b525dc219cef424a5b5534a1
jin added a commit to jin/bazel that referenced this issue Sep 15, 2016
This is an initial implementation of the maven_jar rule in Skylark, targeted at
the FRs in issue bazelbuild#1410.

Attributes `name`, `artifact`, `repository`, `sha1` have been implemented, but
not `server`.

Implemented a wrapper around the maven binary to pull dependencies from
remote repositories into a directory under {output_base}/external.

Caveat: this rule assumes that the Maven dependency is installed in the
system. Hence, the maven_skylark_test integration tests are tagged with
"manual" because the Bazel CI isn't configured with the Maven binary
yet.

Added a serve_not_found helper for 404 response tests.

Added a `maven_local_repository` rule to fetch the initial maven
dependency plugin for bazel tests.

With regards to server, there are some limitations with retrieving a
maven_server's attribute at Loading Phase without the use of hacky macros
(issue bazelbuild#1704), and even if macros are used, the maven_server is not treated as
an actual dependency by maven_jar. There is a test
(`test_unimplemented_server_attr`) to ensure that the error message to shown to
users if they use the server attribute with this rule.

I will have to put more work into implementing maven_server appropriately, and
possibly proposing an API review of maven_jar in another change set.

Change-Id: I167f9d13835c30be971928b4cc60167a8e396893
bazel-io pushed a commit that referenced this issue Sep 23, 2016
**Experimental**

This is an initial implementation of the maven_jar rule in Skylark, targeted at
the FRs in issue #1410.

Implemented a wrapper around the maven binary to pull dependencies from
remote repositories into a directory under {output_base}/external.

Attributes `name`, `artifact`, `repository`, `sha1` have been implemented,
but not `server`.

Caveat: this rule assumes that the Maven dependency is installed in the
system. Hence, the maven_skylark_test integration tests are tagged with
"manual" and commented out because the Bazel CI isn't configured with
the Maven binary yet.

Added a serve_not_found helper for 404 response tests.

Usage:

```
load("@bazel_tools//tools/build_defs/repo:maven_rules.bzl", "maven_jar")

maven_jar(
    name = "com_google_guava_guava",
    artifact = "com.google.guava:guava:18.0",
    sha1 = "cce0823396aa693798f8882e64213b1772032b09",
    repository = "http://uk.maven.org/maven2",
)
```

With regards to server, there are some limitations with retrieving a
maven_server's attribute at Loading Phase without the use of hacky macros
(issue #1704), and even if macros are used, the maven_server is not treated as
an actual dependency by maven_jar. There is a test (`test_unimplemented_server_attr`)
to ensure that the error message to shown to users if they use the server
attribute with this rule.

--
Change-Id: I167f9d13835c30be971928b4cc60167a8e396893
Reviewed-on: https://bazel-review.googlesource.com/c/5770
MOS_MIGRATED_REVID=133971809
@aj-michael
Copy link
Contributor

👍 for adding transitive deps to this. is there any technical reason that we're aware of that we haven't made transitive deps work yet?

@jart
Copy link
Contributor

jart commented Dec 13, 2016

It depends on what you mean by transitive deps working. The biggest problem right now I feel is that maven_jar doesn't let one define the dependency relationships. I've fixed this in the java_import_external repository rule which I'll be contributing to Bazel shortly.

I've also built a web GUI which I'm currently seeking approval to launch which will make it easy for users to generate configurations for this rule. The web GUI will read the pom.xml files from the Maven server, resolve transitive and diamond dependencies, and create code that shows you exactly what's going into your project. I feel like this is the best direction for Bazel. It leads to much faster builds which are actually hermetically sealed without magic.

@aj-michael
Copy link
Contributor

By transitive deps working, I mean the rule fetching the dependency relationships from the Maven server and not requiring the developer to specify them.

@jart
Copy link
Contributor

jart commented Dec 13, 2016

In order to do that in a repository rule, it would probably be necessary to have the rule shade all the transitive jars into the root jar. That means rewriting the transitive class names, rewriting the byte code, and then the code size increases quadratically.

@aj-michael
Copy link
Contributor

Hmmm, I'm not sure I follow. Why would it be necessary to shade the transitive jars and rewrite class names? Perhaps I'm missing something, but the way I would expect it to work would be:

  1. Change the mvn command that we use to download the JAR to also download its transitive dependencies.
  2. Change the maven_jar_build_file_template to create a separate java_import for each of the dependency artifacts and wire these up with exports. These targets would be something like @somemavenjar//jar:dep_on_guava_21.0.
  3. Developer depends on @somemavenjar//jar which exports all of its dependencies.

I don't know how to do 1, but I assume there must be a way since other build tools do this.

@jart
Copy link
Contributor

jart commented Dec 13, 2016

Having a single remote repository for all the maven jars required by the project, and each individual jar being its own rule within the repository, would avoid the need for shading. E.g. @closure_rules_maven_jars//:com_google_guava. Shading is only necessary if you want to have the same behavior as maven_jar where jars have a 1:1 mapping with repository names.

But doing things that way introduces another problem. What if another Bazel project depends on that Bazel project? It would have to adopt @closure_rules_maven_jars as its container for all its jars, and then redefine the whole thing, in order to put its own jars in there. If it doesn't do that, then we end up with quadratic dependencies again.

@jart
Copy link
Contributor

jart commented Dec 13, 2016

There's a lot of value to not fetching transitive dependencies auto-magically. For example, with the web gui I just wrote, I generated the following config for com.google.template:soy:2016-08-25. In doing so, I was able to identify a bug in com_google_common_html_types which is depending on Guava Testing Library without declaring it as a test scoped dependency. I was also able to audit the licenses of all my transitive dependencies very easily. But most importantly, by using this config, builds are going to go insanely fast for my users, because calculating that config required downloading 150 things, e.g. pom.xml files. Furthermore, I'm able to effectively mirror my dependencies so builds can be durable and never break.

@wstrange
Copy link

@jart The web gui sounds awesome. Are you close to open sourcing it?

@jart
Copy link
Contributor

jart commented Dec 13, 2016

Expect it at some point in the upcoming months. I need to go through the process. I've also got a lot of other stuff on my plate with TensorFlow.

@kchodorow kchodorow modified the milestones: 0.5, 0.6 Dec 21, 2016
@aj-michael
Copy link
Contributor

Only option 2 supports AAR files. It would be great if whatever solution we settled on supported arbitrary artifact packaging types.

@jart
Copy link
Contributor

jart commented Aug 7, 2017

@kchodorow I've mailed you a changelist adding java_import_external to Bazel. The community should be able to expect it soon. I've also added very helpful documentation with examples.

@wstrange
Copy link

wstrange commented Aug 7, 2017

Speaking as a Bazel newbie, presenting multiple solutions for maven migration is very confusing.

A single, well supported, documented and "official" maven migration solution would be really nice, and I think is key for driving bazel adoption for Java projects.

@ittaiz
Copy link
Member

ittaiz commented Aug 8, 2017

@jart we (scala people) have a need to be able to turn off ijar creation for some external jars.
A current ad hoc solution is to use the native maven_jar and a custom scala_import which uses the file instead of the java_library.
Will it be possible to support disabling ijars on specific cases?

@jart
Copy link
Contributor

jart commented Aug 8, 2017

If the Bazel authors add an attribute to java_import that turns off ijar creation, then java_import_external will absolutely be updated, since the latter is basically the same rule with some urls attributes added.

@ittaiz
Copy link
Member

ittaiz commented Aug 8, 2017

Thanks! @kchodorow are you the right person to ask?

bazel-io pushed a commit that referenced this issue Aug 9, 2017
This Skylark rule is a replacement for maven_jar.

See also #1410

PiperOrigin-RevId: 164642813
@cgrushko
Copy link
Contributor

cgrushko commented Sep 5, 2017

@jart did you end up adding java_import_external to somewhere in Bazel?

@jart
Copy link
Contributor

jart commented Sep 5, 2017

@cgrushko Indeed I did. It was added to the Bazel codebase 28 days ago in 062fe70. Judging by the baseline, it doesn't look like it made it into 0.5.4, but it's certain to make it into the next one. I hope you enjoy this rule. Usage examples can be found in Closure Rules, Nomulus, and many other places.

@wstrange
Copy link

wstrange commented Sep 5, 2017

@jart Does that rule support authentication to a private maven repo (Artifactory in our case)?

If not, any ETA?

@or-shachar
Copy link
Contributor

Hey @jart
Rumor has it that you also created some gui tool for converting maven coordinates to java_import_external. Is it open sourced? We'd love to check it out!

@StephenAmar
Copy link

Any news regarding that web tool you've been mentioning in other bugs @jart
I kind of want to migrate my repo to java_import_external, but without something like generate_workspace to resolve transitive dependencies, it's quite a lot of work.

@jart
Copy link
Contributor

jart commented Oct 23, 2017

Behold Bazel Maven Config Generator in #3946 and the demo video on YouTube. @or-shachar @StephenAmar

@wstrange
Copy link

From a quick glance of the above PR, it looks like this does not support private Maven repos such as Artifactory?

@jart
Copy link
Contributor

jart commented Oct 24, 2017

@wstrange I don't see why it wouldn't. It also depends on what you mean. For example, you can just sed "repo1.maven.org" in index.html to whatever and it'll crawl the POMs. If you want to it to be able to crawl multiple POM repos, that might not be a trivial change.

Also keep in mind that java_import_external has no awareness of POM metadata. It just grabs jars from whatever URL. I'm also pretty sure Bazel's downloader can do HTTP auth using environment variables. See ProxyHelper.java. It's also probably possible to put the user:pass in the URLs itself, although you might not want to check that into your codebase.

It's also worth mentioning that Google Drive mirroring feature sort of magically and painlessly creates your own private Maven server on the fly. Although it just mirrors the JARs since that's all java_import_external needs.

@wstrange
Copy link

[Disclaimer: I am a Bazel newbie, so the questions I am asking may not make sense ;-) ]

The way our Artifactory repo works is that there could be several different repos defined, and each has a potentially different set of credentials. So the http auth credentials used by java_import_external would vary depending on which repo the dependency is coming from.

Maven handles all of this by using the credentials defined in ~/.m2/settings.xml. It is not clear to me how to accomplish the same thing with Bazel.

@jart
Copy link
Contributor

jart commented Oct 25, 2017

Is Artifactory sort of like a really robust Squid caching proxy? Reading about it, I couldn't help but notice that Artifactory Enterprise Edition offers five-nines availability. I actually have a great deal of respect for the JFrog developers, for having achieving this level of reliability. It's a level of engineering most thought only AT&T and Chubby could master. Even Google Cloud Storage, with its transcontinental redundancy, is only able to promise three-nines. However java_import_external can actually deliver Erlang reliability. If the urls=[...] attribute has mirrors to three three-nine CDNs then you get nine-nines availability ((1-(1-99.9/100)**3)*100=99.9999999.) If Jesus Christ used Bazel then there'd be about 63 seconds thence when builds could break on downloads. But if we consider that Bazel retries failed requests with exponential backoff for longer than that, then the reliability that spans the ages actually transcends nines and becomes 100. Bazel Community Edition can offer you this incredible level of value, not just for the low-low price of $29,500/year. No my friends, in fact, it doesn't even cost $14,750. You can have it all for the bargain basement price of zero dollars. Yes ladies and gentlemen it's free, and the source code comes included.

But it might need improvement when it comes to that private authentication use case. It's one I haven't considered, because I mostly do open source stuff. Also internally at Google we just vendor everything in our monolithic repo.

One thing you could do is put this in your zone:

$TTL 0
artifacts    IN  A    192.168.10.4
             IN  A    192.168.10.5
             IN  A    192.168.10.6

Put this on your servers:

import BaseHTTPServer
import SocketServer
import base64
import httplib
import shutil
import urlparse

basic = lambda u,p: 'Basic %s' % base64.b64encode('%s:%s' % (u,p))

AUTHORIZATIONS = {
    'maven.initech.com': basic('aladdin', 'opensesame'),
    'maven.vendoro.com': basic('aladdin', 'opensesame'),
    'localhost:5000': basic('aladdin', 'opensesame'),
}

class Handler(BaseHTTPServer.BaseHTTPRequestHandler):
  def go(self):
    ru = urlparse.urlparse(self.path)
    pu = urlparse.ParseResult('', '', ru.path, ru.params, ru.query, ru.fragment)
    auth = AUTHORIZATIONS.get(str(ru.netloc))
    if auth:
      self.headers['Authorization'] = auth
    self.headers['Host'] = ru.netloc
    if ru.scheme == 'https':
      c = httplib.HTTPSConnection(ru.netloc)
    else:
      c = httplib.HTTPConnection(ru.netloc)
    try:
      c.putrequest(self.command, pu.geturl())
      for k, v in self.headers.items():
        c.putheader(k, v)
      c.endheaders()
      r = c.getresponse()
      self.send_response(r.status)
      for k, v in r.getheaders():
        self.send_header(k, v)
      self.end_headers()
      shutil.copyfileobj(r, self.wfile)
      self.wfile.flush()
    finally:
      c.close()
  do_GET = go
  do_HEAD = go

class ThreadedHTTPServer(SocketServer.ThreadingMixIn,
                         BaseHTTPServer.HTTPServer):
  daemon_threads = True

ThreadedHTTPServer(('', 4000), Handler).serve_forever()

Then run Bazel like this:

$ HTTP_PROXY=http://artifacts:4000 bazel build //...

And you should be good.

@wstrange
Copy link

So I think what you are saying is that when you are at 10 nines of availability, you have no place to go. Bazel goes to 11 nines.

Artifactory and Nexus are very common in the "enterprise" space. If Bazel is to attract hordes of Java developers (and that may not be a goal ;-) ), having first class support for private maven repositories (with authentication) is essential.

The proxy idea is super creative (I really appreciate you taking the time to put together a solution). I'll review it - but I think it will be a non starter in my organization. The solution has to be integrated and out of the box.

I return to looking at Bazel every 6 months or so, because we desperately need something like it (maven build and test times are getting absurd). But I have to sell this internally, and the maven migration experience is just not there yet. I'll be back though ;-)

@pcj
Copy link
Member

pcj commented Oct 25, 2017

Hi Warren. I'd encourage you to file an issue on rules_maven. It uses gradle to resolve transitive deps under the hood. As gradle already factors in the settings.xml file when fetching artifacts, I'd gander a bet that getting this to work might not be too hard. We'd just have to be able to pass in your settings.xml file as a label to the maven_repository rule such that it can be discovered. It may also require some tweaking of the repositories attribute that maps GROUP:NAME patterns to the (artifactory) url where those artifacts can be found.

@jart
Copy link
Contributor

jart commented Oct 25, 2017

@wstrange I encourage you to file a feature request asking for the ability to add to say fetch --auth user:pass@user.com in ~/.bazelrc so downloader can do Basic Authentication (see also). It's not an unreasonable thing to ask, and wouldn't be difficult to implement. But there's the proxy solution in the interim.

I can't speak for the Bazel team or Google, but I'm sure they want nothing more than the largest number of people to benefit from Bazel as possible. While we're in the business of sharing world-class technology, we can't always be in the business of solutions, and some assembly is required. I think that's OK, because it creates opportunities for entrepreneurs to build those turn-key solutions on top of the work we're sharing.

For example, nothing would make me happier than to see someone come along, take that Apps Script I posted a few comments ago, and get rich turning it into a business. If that ends up being one of you, buy me a drink next time you're in the Bay Area.

@StephenAmar
Copy link

@jart Thanks a lot for the config generator. It was very useful.

A tricky question for you though.
I'm having a lot of trouble using extra_build_file_content because I can't seem to be able to use non native rules there (like a rule to shade libraries, or scala specific rules).

Any ideas?

@jart
Copy link
Contributor

jart commented Nov 17, 2017

I would advise against doing anything nontrivial in extra_build_file_content. You can probably do it in your main repo build files. Otherwise, you might be able to load() the appropriate skylark rules, possibly using "@//..." syntax to reference the main repo.

@dslomov dslomov removed this from the 0.6 milestone Jan 11, 2018
tekumara added a commit to tekumara/proxy-with-basic-auth that referenced this issue Dec 29, 2018
@dslomov
Copy link
Contributor

dslomov commented Mar 21, 2019

All such feature requests now belong in https://github.com/bazelbuild/rules_jvm_external

@dslomov dslomov closed this as completed Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) type: feature request
Projects
None yet
Development

No branches or pull requests