Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove HTTP from the Riak Java Client #268

Closed
broach opened this issue Jul 24, 2013 · 21 comments
Closed

Remove HTTP from the Riak Java Client #268

broach opened this issue Jul 24, 2013 · 21 comments

Comments

@broach
Copy link
Contributor

broach commented Jul 24, 2013

This is a tracking meta-issue for the removal of HTTP from the Java Riak client library

Rationale

Version 2.0 of the Riak Java client (RJC) is going to be large overhaul of the client. The main goal is to remove years of technical debt at the core by retiring the original Protocol Buffers (PB) and HTTP clients, replacing them with a modern core that uses Netty4. This also allows us to remove the translation / abstraction layer that sits between IRiakClient at the top and the aforementioned original clients at the bottom. By doing these things, in addition to allowing us to address some long standing design issues, future maintainability and feature development costs are reduced significantly.

Historically, the reason for supporting both Protocol Buffers and HTTP protocols in the client library was due to the Riak Protocol Buffers API not supporting all Riak operations. As of Riak 1.4 and the RJC 1.4.0 we have reached feature parity between the two APIs. Protocol buffers is the faster and more efficient way to talk to Riak. In addition, feature development and maintenance is again helped by not having to implement every feature multiple times.

Thus, I proposed we no longer include HTTP in the RJC starting with v2.0.

Authentication / Authorization

Ignoring feature parity, the one reason users have chosen to use HTTP over PB when using the RJC is that Riak had no Authentication or Authorization features. By using an authenticating HTTP proxy between the RJC and Riak some amount of security could be provided.

This all changes in Riak 2.0 and the RJC v2.0 will support the new authentication and authorization features over PB. For complete details on these new features please see basho/riak#355

Open Questions

  • Is there some compelling reason to keep HTTP in the RJC which we have not thought of?

Risks

  • Current deployments using HTTP proxying / authenticating will need to alter their architecture. This may be seen negatively by those current users.
@randysecrist
Copy link
Contributor

Throwing out http proxying is a definite negative, not a maybe. Users also choose http because it s not as opaque and easier to see what is going on when things do not work.

@novabyte
Copy link
Contributor

FWIW, from my experience working with Riak users, the HTTP interface was often chosen because it is a very easy way to take advantage of a caching proxy, typically using either Squid or Varnish.

@guidomedina
Copy link

Why would you use a proxy to cache a whole HTTP response with headers and everything when you can use an in-memory cache to store JSON only data? Also, it is a bad design cause of CAP theorem, imagine you are caching a value of one of the nodes, how do you know it is consistent? Remember the eventually consistent theory?

I might add that caching a Key-Value is only a good idea if you have total control over your caching mechanism so it is really a bad idea to use Squid to cache KV, I agree that some users will be left hanging cause of lack of Riak Java HTTP client, but that doesn't disable them from using HTTP to diagnose with either curl or any browser.

Imagine caching a mutated Object, fetch (Squid cache), mutate and write, how does the proxy knows the value has changed if it was just changed?

In fact, cause of so many services that try to be smart about HTTP I would say HTTP bring more problems that what it solve (In the case of Riak)

@novabyte
Copy link
Contributor

@guidomedina I did not say that it was the most optimal solution, nor did I recommend it as a fantastic use case for the HTTP interface. I simply said that I've spoken to more than a few Riak users that do this.

There's nothing wrong with using a HTTP cache if it gets a startup to "Minimum Viable Product" quicker. Not every company has the time to invest in the most optimal solution to a problem. Users store many different types of data some of which has guaranteed immutability, at which point it doesn't matter whether it's being served from the canonical source or a caching server.

A quick Google search will reveal this is a use case, irrespective of the suitability of it from an academic perspective. It's also mentioned in a few Riak Recaps, and a little bit on the mailing list.

I don't want to derail this discussion into whether a HTTP caching layer in front of Riak is a good idea or not. This is about whether the Java 2.0 client should support both protocols or not.

FWIW, I'm of the opinion that it should. The protocol abstraction code for it is already there on the 2.x-development branch and Netty 4 does a fantastic job of simplifying the networking infrastructure.

@broach
Copy link
Contributor Author

broach commented Jul 25, 2013

@randysecrist This isn't abut removing HTTP from Riak, this is removing it from the Java client library which is completely abstracted away from the user.

@seancribbs
Copy link

@randysecrist Also, addressing your point about visibility, what sorts of things would help alleviate that problem? Users should not have to debug Riak interactions at the protocol-level IMO.

@drewkerrigan
Copy link

One client in particular requires all requests (even from their application to Riak) to go through an Apache LDAP module, so they can currently only use the HTTP interface. They are using the Java Client.

@guidomedina
Copy link

@novabyte Yes, with that level of control it is not risky, generically speaking having HTTP available will make some users think that it is easier and hence they would lose the benefit of them using PBC (Then they say, Riak is slow when they should say, I'm not using it correctly), specially if they need multi-threading, PBC is the way to go, I'm not saying "Go and get rid of HTTP", I'm saying; "Please, move to PBC for your sake"

@drewkerrigan Isn't possible to plug an authenticator (or intersect the call) and still use PBC? Why is it the authentication so tidied up to the protocol?

@broach
Copy link
Contributor Author

broach commented Jul 25, 2013

@novabyte I know the protocol abstraction code is already there; I wrote it. The issue isn't the networking part; that's never been a problem, really - It's all the parsing and ugliness that comes after when talking about using HTTP.

@drewkerrigan Right, and Riak 2.0 Auth supports LDAP.

@randysecrist
Copy link
Contributor

@broach I understand that HTTP will continue in Riak proper; I am unclear about what problem is being solved by removing HTTP and I am not aware of anyone that has asked for it to be removed. Would you explain what the benefit of removing the HTTP interface is?

@seancribbs Some community libraries only support HTTP as it is a very well understood protocol; will HTTP still be a first class citizen at the riak layer for new 2.x features?

@YTN0
Copy link

YTN0 commented Jul 31, 2013

I see no reason to continue keeping the HTTP interface in the client. Since it is completely abstracted away from the client APIs, there is no perceived value in keeping it anyways. I think it just adds cruft and complexity under the hood. The HTTP interface on the Riak server is still available for command line (and browser based) testing etc.

I am hoping that by cleaning things up and switching solely to PBC, the client will be leaner and (to some degree) faster, and you will get the opportunity to clean up and simplify the APIs.

@guidomedina
Copy link

@ytn I concur, it will help to focus on quality, not quantity, it is discouraging to keep supporting two network protocols at the client, specially for the mentioned Riak HTTP client development hassle behind the scene and as a good side effect will be that Riak Java client developers will only focus on Riak PB (Supported by Basho internally) and Riak Java client.

With so much work to do having an extra and not so beneficial development overhead will only delay what matters: Functionality.

@novabyte
Copy link
Contributor

@guidomedina The historic problems with the rate of feature development with the Riak Java client has nothing to do with dealing with two transport protocols. It has to do with the current client being a merger of a two different client implementations (that share very little in common). You can read more about this here: https://github.com/basho/riak-java-client#some-history

Netty appears to have been chosen as the Asynchronous IO framework for use with the next generation of the client (which is great). It already supports a level of transport abstraction that will (and does) heavily simplify the maintenance of managing both HTTP and PB.

You can see the existing work on this from @broach here:

https://github.com/basho/riak-java-client/tree/2.x-development

@broach As far as I'm aware (correct me if I'm wrong) but isn't the motivation for removing the HTTP interface from the Java client to remove the need to explain that Riak Java client users should be using the PB interface. Wouldn't this be fixed by configuring the next-gen client to default to PB unless specified otherwise?

@guidomedina
Copy link

@novabyte As simplified as it looks to have Netty -I know it is great cause I implemented custom TCP server for a logger and I know it works great, nice NIO development practice with channels, groups and stuff- but still you have that overhead.

Imagine that time spent in something like spring-data-riak, do more on the Domain side, more on the interceptors side (besides mutations and resolvers), more generically speaking in the model driven side of the client, there is so much more that could be done from the Design Patterns perspective, that said, do you think it is worth to maintain yet another and not so beneficial protocol?

Just wonder, what JDBC driver (to mention a similar scenario) offers you more than what is needed and optimal? I'm not saying there isn't, but if there is I can't recall any from my memory.

@mingfai
Copy link

mingfai commented Aug 16, 2013

Good to see the Java client to move to this direction.

  1. I don't see any reason you should keep any HTTP client code. The client library shall take the most efficient optional. If there are anything not supported in PB but HTTP, just drop the feature and add it back later (in 2.1, 2.2 etc.).
  2. Netty is an async framework that the API shall be async but not to maintain any API compatiability. It's most typical to use return a Future<?>. (some other async API use Listener) I recalled that a few years ago I wrote a proof-of-concept Netty PB client, and the most stupid idea is just to use Netty to implement the sync API.

btw, could you update to the latest Netty 4 release? There were quite a lot of API change since Beta 3 (the current is 4.0.7.Final)

@broach
Copy link
Contributor Author

broach commented Aug 16, 2013

@mingfai Thanks for the feedback.

re: Async - yes, the new Java client API is built around being async. You'll get back futures and will also be able to register as a listener with a future.

re: Netty4 - I'm reworking some of the internals at the moment. After that commit I'll be moving the dependency to Netty 4 final and addressing anything that needs to be changed.

@novabyte
Copy link
Contributor

Looks like the decision was made: #273. This issue can now be closed.

@broach
Copy link
Contributor Author

broach commented Sep 17, 2013

@novabyte Just as an FYI, it's still slightly not settled. What I did do was rip out all the code that facilitated supporting two protocols simultaneously. The original design was conceived before we had feature parity between HTTP and PB. This meant it was desirable for the client to be able to fall back to HTTP when the user wanted to do something that PB didn't support. It added a lot of complexity that was no longer needed.

If we decide that we have to support HTTP it's now going to be done by abstracting the protocol specific code into separate modules; you'll just pick which one you want and include it as a dependency. There's some serious hand waving there in regard to how configuration will work, but overall I think it's a better approach.

@runesl
Copy link

runesl commented Mar 12, 2014

We have at least one major customer, that uses an intelligent load balancer between Riak and the clients. The LB inspects, logs and analyses HTTP traffic, and we much prefer HTTP over PB due to transparency on the wire and better network tooling support.

@coderoshi
Copy link
Contributor

Closing. We're moving all drivers away from HTTP.

@runesl
Copy link

runesl commented Jun 6, 2014

We'll escalate this decision through official Basho channels, as it leaves us without an upgrade path from our currently supported EE-installation unless we do substantial rework of client-code and change our network infrastructure - if this is at all possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants