Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not work for large clusters #22

Open
ghost opened this issue Aug 1, 2013 · 4 comments
Open

Does not work for large clusters #22

ghost opened this issue Aug 1, 2013 · 4 comments

Comments

@ghost
Copy link

ghost commented Aug 1, 2013

I built hannibal, changed the storage layer to mysql from h2 and we have around 1000 regionservers with around 300k regions in total. Because so much of the logic is done at the view layer it takes minutes to load. The response size for some of the requests is > 200 MB and too much of the logic in terms of sorting and combining is done at the view layer. Thus making hannibal which is an awesome tool unusable.

@meniku
Copy link
Collaborator

meniku commented Aug 6, 2013

Whoah that's quite a lot of Regionservers. I am aware that Hannibal is currently not usable for such large installations. However, you came pretty far, I would have guessed Hannibal crashes a lot earlier ;-)

A colleague and me just looked at the request and response and think it can be improved quite a lot:

  1. Enable GZIP. Unfortunately it doesn't seem to come out of the box with Playframwork
  2. Add specific API for each graph to reduce payload. As you said: too much logic is done at the view layer. This is a major refactoring though.

However, I fear that this won't be enough. Adjustments have to be done at the UI to be usable for such huge amounts of data. Another bottleneck could be the communication between Hannibal and the Regionservers. And we should consider to make the recording of metric/compactions optional altogether or configurable on a per-region basis.

If someone wants to start working on some of those issues, let me know if I can assist you.

@ghost
Copy link
Author

ghost commented Oct 8, 2013

Sorry for the late response, I started working on it a bit. The main thing I noticed when looking at the server side code was that for each graph we make calls out to every region server for data points. When I created a cache that would use a background thread to update at an interval of every 5 to 10 minutes for example this helped a great deal in response times in loading the graphs. Still the view layer was somewhat of an issue, but the region cache improved performance quite a bit.

A crude patch would be located here:
https://github.com/churrodog/hannibal/commit/13231aa4eb7d14016ea854b74d90cca329a66546

please forgive my scala code, I know its pretty horrible - just a POC.

The view layer refactoring seems like a daunting task as I would probably screw things up, but I would be up for helping create API's for each graph thus the rendering layer refactoring could be done incrementally.

@meniku
Copy link
Collaborator

meniku commented Oct 9, 2013

Thanks very much for the commit, this looks like a great improvement. I also like that you cleaned up the model a bit :-)

However I think we'll have to reduce the intervall of 30 minutes per default and introduce a configuration value for that. Also I have to think about how we'll sync this up with the regioninfo metrics as it doesn't make any sense to record the same cashed values over and over again. Maybe we should change the update of the regioninfo metrics so that they are recorded just after the cache gets updated.

I think we should introduce the following configuration values (dunno wether the defaults are good values though):

  1. regions.update-interval: default 120s : determines how often the cache get's updated and also how often metrics will get recorded
  2. compactions.update-interval: default 300s: determines how often the logfiles get fetched and compactions are recorded. 0 means recording the compactions is disabled altogether.

I hope I can implement it soon.

meniku pushed a commit that referenced this issue Oct 21, 2013
…on parameters for controlling how often metrics are fetched, refactor old configuration names and other refactorings. #22 Closes #25
@meniku
Copy link
Collaborator

meniku commented Oct 21, 2013

I added most of your code and added the new configuration values (names differ a bit to the previous proposed ones).
It's available in the next branch and I will merge it back to master soon.

@meniku meniku mentioned this issue Nov 11, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant