-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration PR: HTTP Management Proxy Implementation #1356
Conversation
No linked issues found. Please add the corresponding issues in the pull request description. |
7f8b52f
to
79844b3
Compare
Closed via #1408 . |
8f738f7
to
e161f41
Compare
Co-authored-by: Miles-Garnsey <miles.garnsey@datastax.com>
…ts into ICassandraManagementProxy and implement in both HTTP and JMX impls. (#1358)
Implement methods: - getClusterName - getLiveNodes - clearSnapshot - listSnapshots - takeSnapshot
* Add stubbed polling of job details from the mgmt-api. This will not work without the actual client implementation Implement using apiClient the triggerRepair, getJobDetails, scheduler as well as add a simple test to ensure the state is managed correctly * Merge test files after the rebase * Add a test to verify the behavior of the notifications polling * Address comments
* Move HTTP repair implementation over to new V2 endpoint. * Strip out Cassandra <3 repair methods and fix method signatures to always use newer ringrange. * Fix tests. * Remove tests for Cassandra <3. * New cancelAllRepairs http method. More tests. * Bump version of managment API client in pom.xml to bring in repair methods with correct Long integer type.
…#1376) * Implement HttpCassandraManagementProxy.getEndpointToHostId(), HttpCassandraManagementProxy.getLocalEndpoint(), HttpCassandraManagementProxy.getTokens()
…1373) * Implement getTokenEndpointMap. * Fix getTokens * Implement getEndpointToHostId. --------- Co-authored-by: Miles-Garnsey <miles.garnsey@datastax.com>
* Remove references to RunState NOT_EXISTING, since they cause spurious errors in tests and this state no longer exists. * Remove references too JMX from ClusterFacade and make it more generic.
…e for each connectImpl call (#1413)
Comment out test steps which rely on getPendingCompactions. Comment out percent-repaired related test.
d72a78c
to
e2d1998
Compare
* Implement getPendingCompactions in the HttpManagementProxy Since metrics are now exposed on a different port than the mgmt-api itself, this required to go through the HttpMetricsProxy which got partially implemented for that need. It is now able to pull metrics from the metrics endpoint and parse it into GenericMetrics.
* Put a hook in the docker container's cassandra-reaper.yml so that the HTTP management proxy can be enabled via environent variable, instead of only through the config file.
* Update management API client and remove references to notifications in v2 repair requests.
Reporting the results of my tests:
The repair appears to be progressing normally up to this point. I then go and expand the cluster to 3 nodes, and use cqlsh to add some data to the keyspace under repair in the original node (in a newly created table, while the new nodes are coming up), I see the following:
Prior to that issue, I think Reaper also was throwing some errors around the connectAny method, in which it said it was unable to connect to any of the hosts (and the host it was trying to connect to was the seed service from recollection). The repair has stalled at this point, Reaper appears to be in a crash loop, the UI is not responding (and I assume the probes aren't either, hence why it is being restarted). I also see:
I am not convinced this is an issue with the new HTTP management logic, since authentication to the DB doesn't seem to be related to our changes, I will continue experimenting. One of the logs from Cassandra itself is below: |
I do note that when I delete the k8ssandra cluster, secrets remain in the namespace and are not cleaned up:
So maybe there is an issue where the superuser secret is changed when the cluster is scaled, but the new secret is not remounted (or hot reloaded) by Reaper. |
When I start a repair on a 3 node cluster which has never been restarted (but contains no data), I get the following outcome:
The repair progresses. While I have not let it finish, it seems healthy. If I add data using
Which actually seems like a reasonable output. However, the repair never progresses, and the cassandra nodes report errors too - On the node which the SStables were deleted from:
On a node from which SSTables were not deleted
|
This PR is for integration of the Reaper side of the HTTP management proxy.
Fixes issue:
Epic