-
Notifications
You must be signed in to change notification settings - Fork 164
On Zookeeper
I mentioned much of this in the original README when I launched the project but I thought it deserved a wiki entry on its own.
ZooKeeper makes sense in many cases. ZooKeeper is well-tested and in use all over the place. If your primary ecosystem is Java, ZooKeeper is a no-brainer. If you're already leveraging any of the other Hadoop sub-projects (or former sub-projects), ZooKeeper is a natural addition to your environment. ZooKeeper might appeal to your architectural sensibilities more. If you read the ZooKeeper site and thought "This is perfect" then you should absolutely use it.
ZooKeeper is awesome. It's an amazing tool with some really smart people and ideas behind it.
I want to state first off that these are PERSONAL opinions. You have every right to disagree with them. I would gladly sit down over a beer and geek out on topics like distributed coordination, service locators and REST versus stateful all day long.
If you, like me, looked at the ZooKeeper docs and thought "This is perfect BUT...", it would be worth considering if Noah has what you need. Here are a some of those "BUT"s:
The only officially supported interfaces for ZooKeeper are Java and C. There's also a cli tool. However, researching how to use ZooKeeper outside of those two can be frustrating.
In the contrib directory of the ZK source there are rest, fuse, perl and python libraries but they're contrib. Without getting into "contrib" vs. "official", those don't fit into my ecosystem. On Ruby, I want an official gem I can install. On Python, I want to use pip. I don't want to have to go into the contrib directory.
It seems like a minor point but it's a big break in your workflow if you use those languages.
As a side note, I tend to shy away from native extensions in either language if I can help it. A pure Ruby gem will win out for me in most cases over the native extension unless it's absolutely neccesary.
ZooKeeper isn't "big" per se. You can get a standalone instance running with a single jar file download.
However, for better or worse, java-based applications will always feel "big" to me. It could be the (perceived) startup time. It could be the complex class names. Whatever it is, something about Java always feels "bulky" to me. And this is after 12ish years of supporting Java from an operations perspective. I just don't equate Java with "fast and light"
This isn't the fault of the ZooKeeper project. Like I said, some of these are personal opinions.
From what I can tell, REST access is still experimental in ZooKeeper. That's fine because REST doesn't work in all cases. Look at ActiveMQ. Trying to bolt REST onto the queue semantics doesn't work very well. Bolting REST on top of ZooKeeper doesn't work as well either.
Are those projects capable of supporting a subset of functionality over REST? Absolutely but they weren't designed that way. That's not a criticism or slight. Both ActiveMQ and ZooKeeper were designed with a different set of assumptions.
Anytime I look for a tool to solve a problem, I always give consideration to the "But" scenarios. There is no perfect tool for a given job unless that tool was designed for that specific job (i.e. a basin wrench). Many times a tool comes close enough that you can use it with very little reengineering on your part. You should be clear on what those compromises are that you're willing to make.
As I said in the original README for the project, in every situation I've been in where ZooKeeper was exactly what we needed, the "But" has always been one that we couldn't compromise on.
I typically set up Google Alerts for any of my current active projects. One came across pointing me to a post on the ZooKeeper developer mailing list here. I subscribed to the list but was unable to find the message ids I needed to reply properly and honestly, I didn't want to pollute the list with non-development chatter so I created this addendum.
I wanted to bring up a few points:
I'm going to have to disagree with this. Redis is quite possibly the easiest third-party software package I've ever had to deal with. It has ZERO external dependencies. That's a big deal when dealing with compiled software to be distributed to servers.
Additionally, the defaults out of the box are perfectly acceptable for what I'm using it for. Tuning is very straightforward and done via a flat-text configuration file. There aren't THAT many knobs I need to tune.
Contrast this with any standard Java application. Increases min/max memory settings, try some permgen tuning. Throw in different garbage collection schemes. Not to be down on Java applications but when it comes to tuning and managing, I can't think of anything with which I've had more annoyances.
And that's just the JVM. Tuning JBDB is a PITA in its own right.
With Redis, at a basic level I have to set some memory allocation and possibly tune some persistence settings. Both are very straightforward and clearly documented in the configuration file.
But regardless of that, out of the box Redis is perfectly fine for the way Noah uses it. As it relates to clustering, again, Redis clustering is very straightforward should the need arise. As for Noah, it's just a matter of spinning up another instance talking to the same Redis instance.
The second message I wanted to address is the one following the item above. The author says two things:
- There's a lot of fear and hesitation around building (high performance, critical) infrastructure around Java.
- Interfaces are very important
I partially agree with the first and entirely agree with the second.
As it relates to the first one, in general I personally approach performance tuning of any Java application with a "I hate this part of my job" attitude many times. It's not the right attitude by far but it's one of the least enjoyable. Tuning Java applications to be performant feels like a never-ending battle. Tune this JVM setting and wait. Nope still having problems. Let's try this one. And this one. Now it's all running stable and we're getting PermGen errors. Let's try bumping up this Voldemort client setting.
It's a frustrating experience.
However I know that I'm no Java developer. I trust that the developers I work with so closely are smart people. And they are but even if the code we write and deploy is perfect, there's still all the JVM and library tuning we have to do.
As for the second point, I am entirely in agreement. I made it clear in this wiki entry as to why I started Noah. Interfaces were one of the KEY points. For better or worse, REST and JSON are the lingua franca these days. Every single language I can think of, including shell scripts, can make HTTP calls with different verbs and parse JSON. Asynchronous communication is almost right up there. It provides the greatest flexibility to distribute components of your architecture and interact with them. Want to rewrite this component in Python? go right ahead. Grab RestClient and simplejson. Need to use a Ruby client as well? JSON ships with 1.9.2 natively.
As I said, I really like ZooKeeper. It's a great project/product. Almost every inch of Noah was inspired by some functionality in ZooKeeper.