Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-2335: [UI] Implement synchronization between browser user state and Hbase user state #1575

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ruffle1986
Copy link
Contributor

@ruffle1986 ruffle1986 commented Dec 11, 2019

Contributor Comments

Link to the original JIRA: https://issues.apache.org/jira/browse/METRON-2335

This PR includes changes regarding where to store user related settings and the removal of the DataSource abstract class and its implementation called ElasticSearchLocalstorageImpl.

We understand the basic idea behind the concept of having an abstract class and its implementations but I believe that it was wrongly used and had many problems.

First of all, in that case, having an interface instead of an abstract class would have been more appropriate.

Secondly, the name of the implementation class is ElasticSearchLocalstorageImpl. Yes it persists data in the Browser's local storage but has nothing to do with Elasticsearch. Also, this frontend service should not know about the chosen tool on the backend side because it's irrelevant from this perspective.

I believe that the original idea was to easily switch between approaches of storing this kind of data on the backend but since the frontend should not know about these details, it's not necessary to use this concept anymore.

Also, if you take a look at the implementation class (ElasticSearchLocalstorageImpl), there are methods which aren't required by the abstract class and they communicate with an http server directly in order to save or get alerts data. But these methods are unused, we rather get this data independently from the data source class however it was the original idea behind having it.

Long story short, we decided to completely eliminate the DataSource class and its implementation and rather have a so-called UserSettingsService in order to get and store user specific settings on the backend.

They used to be stored in the browser's local storage but this approach works only on one computer but we would like the users to be able to get their desired settings on every computer they're logged in.

The following user settings are replaced from the local storage to Hbase:

  • time zone configuration
  • show/hide dismissed and resolved alserts
  • auto-polling settings
  • table configuration
  • saved and recently used saved searches

We decided to store everything simply in Hbase because it provides us the freedom to store basically anything without having backend work involved (declaring Java classes and additional Java properties, etc.).

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
    
  • Have you written or updated unit tests and or integration tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    
  • Have you ensured that any documentation diagrams have been updated, along with their source files, using draw.io? See Metron Development Guidelines for instructions.

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

@ruffle1986
Copy link
Contributor Author

I'm closing and re-opening the PR to force Travis to run the build again.

@ruffle1986 ruffle1986 closed this Dec 12, 2019
@ruffle1986 ruffle1986 reopened this Dec 12, 2019
@ruffle1986
Copy link
Contributor Author

@sardell found a race condition in the app during testing. I tested the changes with the dev tools opened which slows down the runtime so I hadn't got the chance to meet the problem.

So there was a race between the user settings service loading the data from the server and the alerts list component. Sometimes the component got rendered earlier before the user settings service loaded the settings successfully.

The main goal was to load the user settings before anything is rendered on the screen and originally I wanted to use the APP_INITIALIZER to load the settings but the user needs to be logged in in order to reach hdfs via the api service, so the APP_INITIALIZER could not have worked. That's why I put this mechanism into the Authentication service but it led to a race condition.

So I came up with another strategy and implemented a resolver which can be used as a route guard in the routing module. By implementing the resolver, I managed to make sure that the user settings are successfully loaded from the server before the route component gets rendered.

Thanks for catching this issue, @sardell 👍!

@sardell
Copy link
Contributor

sardell commented Dec 12, 2019

I tested all of the user settings you mentioned with full dev and everything works great from a user's perspective.

This is great work, but, of course, I have a few follow up questions/observations. 🙂

  • I noticed that every time we are making a POST request to HDFS with the settings you listed above, we are making two http requests. Why is that?
  • I noticed that when I initially load the page and click on the settings icon, I'm seeing the two http calls mentioned previously even though I haven't made any configuration changes yet. The first call returns with a 500 status and the other returns 200. This only happens the first time I click on the settings icon after a page load. Both calls are to the same endpoint with what appears to be the same request payload.

post-on-settings-click

@ruffle1986
Copy link
Contributor Author

Good points, @sardell . 👍

This PR is focusing on the replacement of the services which means, instead of using the data source class and saving these user settings to local storage, let's just persist them in hdfs. I have just changed the service calls in the exact same places where they used to be called but with the previous service (data source) or where they were directly stored to local storage. So basically it's an issue on the level of the implementation of these features (like auto polling, show/hide, etc.) and not on the new service's level.

So actually these components or services called the methods multiple times to persist the user settings to local storage but it was hardly noticeable but now it is noticeable, since we're performing actual http requests. So in my opinion, if it's suboptimal now, it was suboptimal before as well and I didn't want to optimise the features because it's already a big PR with lots of changes and I didn't want to increase the difficulty for the reviewer because, in my opinion, it's out of the scope of this issue.

But it's true for sure, it's not nice. Even though there are multiple http calls unnecessarily and the server throws an error (it does because we're hitting hdfs frequently probably but not sure), it works fine. I'm open to introduce enhancements on the features` level in order to get rid of these problems.

Would you like me to do it in this PR or it should be a separate Jira task with a separate PR?


Just for the record, here's what's happening on the features` level:

  • When you open pane where you can change the rows per table or the refresh rate, the show/hide service persists the show/hide dismissed alerts and the show/hide resolved alerts when the component is initialised. (2 http calls)

  • When you change the refresh rate, auto-polling service and configure-table service persists their state (2 calls)

  • It's the same when you change the rows per page (2 calls)

  • When you switch the hide dismissed alerts, the configure-table service persists its state and the auto-polling service persists its state twice (3 calls)

  • The same goes for the hide resolved alerts switch (3 calls)

  • When you switch the "Convert timestamps to local time", it works fine (1 call)

  • When you open the table columns settings pane and click on the save button, the column-names service and the configure-table service persist their state (2 calls)

I don't want to details the calls in the save-search module but you get the point.

Also, as I said earlier, these changes are highly opinionated and radically changed the way how we handle user settings involving backend parts. So I'm really curious about @mmiklavc and @nickwallen 's opinion about this. And they might be able to explain the error message given by the server which is the following:

RestException: No lease on /user/metron/user-settings (inode 16861): File does not exist. [Lease.  Holder: DFSClient_NONMAPREDUCE_-955476593_1, pendingcreates: 1]↵	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3697)↵	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3498)↵	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3336)↵	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3296)↵	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:850)↵	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:504)↵	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)↵	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)↵	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)↵	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)↵	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)↵	at java.security.AccessController.doPrivileged(Native Method)↵	at javax.security.auth.Subject.doAs(Subject.java:422)↵	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)↵	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)↵

cc @tiborm

@sardell
Copy link
Contributor

sardell commented Dec 13, 2019

@ruffle1986 Thanks for the thorough explanation.

So basically it's an issue on the level of the implementation of these features (like auto polling, show/hide, etc.) and not on the new service's level.

So in my opinion, if it's suboptimal now, it was suboptimal before as well and I didn't want to optimise the features because it's already a big PR with lots of changes and I didn't want to increase the difficulty for the reviewer because, in my opinion, it's out of the scope of this issue.

I completely agree with you, and I appreciate your consideration for the reviewer(s) of this PR. We can make changes to the actual component implementations in another PR to keep this scope of work focused on the task you set out to accomplish.

I'm a +1 on this, but I'm going to let it sit for another day to make sure others have a chance to weigh in if they want. Thanks for the contribution, @ruffle1986!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants