Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVStore Tools #177

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

KVStore Tools #177

wants to merge 19 commits into from

Conversation

arcsector
Copy link
Contributor

Summary

This PR provides additional KVStore tools available to the user to be configured, including:

  • Disable
  • Backup
  • Oplog and storage engine set at install time
  • KVStore Storage Engine & Server Version Upgrade

- backup
- upgrade
- disable
- include in vars and post-install steps
@arcsector
Copy link
Contributor Author

Possible other features to include:

  • KVStore Restore from backup
  • KVStore Clean
  • KVStore Resync
  • Oplog change in SHC

@dtwersky
Copy link
Collaborator

@arcsector This is a great PR, especially the migration part. As I mentioned in the other comments, the commands need authentication, so we either need to add them, or start the whole task with splunk login so we don't need to set no_log on all the commands. I have not tested the splunk login method, but I assume it would work.

@dtwersky dtwersky self-assigned this Mar 22, 2023
@dtwersky dtwersky added the enhancement New feature or request label Mar 22, 2023
- clean
- destructive resync
- get kvstore captain
- get shcluster captain
@arcsector
Copy link
Contributor Author

Fixed all the auth issues (sorry it slipped my mind) in ce2c80a

- name: Backup KVStore
include_tasks: adhoc_backup_kvstore.yml
vars:
- archive_name: "-archiveName preAnsibleVersionUpgradeBackup"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this maybe be customizable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, not in my opinion, but it's trivial to do so.

dtwersky and others added 2 commits March 29, 2023 19:24
- removed unused var
- check that we can backup before we do
- checks are changed_when false
@@ -69,6 +70,10 @@ splunk_shc_target_group: shc
splunk_shc_deployer: "{{ groups['shdeployer'] | first }}" # If you manage multiple SHCs, configure the var value in group_vars
splunk_shc_uri_list: "{% for h in groups[splunk_shc_target_group] %}https://{{ hostvars[h].ansible_fqdn }}:{{ splunkd_port }}{% if not loop.last %},{% endif %}{% endfor %}" # If you manage multiple SHCs, configure the var value in group_vars
start_splunk_handler_fired: false # Do not change; used to prevent unnecessary splunk restarts
splunk_enable_kvstore: true
splunk_kvstore_storage: undefined # Can be defined here or at the group_vars level - accepted values: "wiredTiger" or "undefined", which leaves as default
splunk_kvstore_version: undefined # Can be defined here or at the group_vars level - accepted values: 4.2 or "undefined", which leaves as default1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this variable used either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, the splunk_kvstore_version is unused - I'll add it to the conditionals for the bottom of the upgrade procedure

@arcsector
Copy link
Contributor Author

arcsector commented Apr 3, 2023

Guess I accidentally created a merge commit - my bad. Feel free to remove - I'm not brave enough to force push to a fork.

@jewnix
Copy link
Collaborator

jewnix commented Apr 3, 2023

After some preliminary testing, there are some issues that need to be addressed here.

  1. I think the adhoc_destructive_resync_kvstore.yml task should be removed from the PR. Some things there do not completely make sense to me. Maybe I'm understanding it wrong, but I don't think you need to remove a member from the SHC to do that.
  2. The kvstore_upgrade.yml does not work on version 8.2.x because the splunk_kvstore_version does not return anything, so it fails on the conditional for the block.
  3. Leaving the splunk_kvstore_storage as undefined, also will not update the engine because of the conditionals on the block level. At least on the 8.2 that I tested.

@arcsector
Copy link
Contributor Author

@jewnix My thoughts:

  1. Destructive resync is something that has been recommended to me through Splunk Support multiple times in order to fix KV issues, so I included it here - it seems that the advantage to it over just a normal clean is that it pulls both a fresh copy of the SH Bundle as well as the KV bundle, so that's the benefit there.
  2. Good call, I'll put a default value in there for versions <= 8.2
  3. That's what I want, I want people to have to specify they're upgrading in order to be able to do so one cluster component at a time.

@dtwersky
Copy link
Collaborator

dtwersky commented Apr 25, 2023

@arcsector

  • Destructive resync is something that has been recommended to me through Splunk Support multiple times in order to fix KV issues, so I included it here - it seems that the advantage to it over just a normal clean is that it pulls both a fresh copy of the SH Bundle as well as the KV bundle, so that's the benefit there.

So here is what I think. The destructive resync should be removed from this PR. 1. Because this is a snowflake issue, and destructive KVStore sync is not something that is documented. 2. Because this also destroys the SHC completely.
Even though support sometimes tells you sometimes to do it, does not mean it's something that should be done.

@arcsector
Copy link
Contributor Author

@dtwersky @jewnix Sorry I've been inactive on this, I removed the destructive resync, and I added a default value, though it's not for splunk_kvstore_version, rather for the splunk_current_server_version_out.stdout check, where the former is powered by default vars, and the latter is performed by a CLI call

Get SHCluster and KVstore status as JSON blobs
@arcsector
Copy link
Contributor Author

arcsector commented Mar 15, 2024

Updating this with oplog size increase, as well as some helpful tasks to get KVStore-status and SHCluster-status as JSON blobs for ansible consumption. I will note this isn't using the docs' oplog increase method, but rather a method that support had been passing around for ages a while ago, so if it is requested that it reflects this document, I can do that instead. Let me know!

@dtwersky
Copy link
Collaborator

dtwersky commented Jul 8, 2024

Hi @arcsector ,

Sorry this was left dormant for so long after so much work has gone in to this. I have been working internally to figure out all of this for a while on a different project, that was more for ephemeral docker instances, but I discovered a lot of things related to this PR that made me look at KVStore upgrades a little differently. There are so many differences between Splunk versions, MongoDB versions and MongoDB engines regarding to upgrade paths. I'm not sure if we should assume that people are still running version 8, and because later versions already automatically migrate and update, there may only be a need to run some of these commands in specific scenarios only.

There are so many amazing things in this PR, and I don't want to close this out and start from fresh, but maybe it needs to be revisited, and think if we want to make this compatible with older versions, or major version jumps.

What are your thoughts?

@arcsector
Copy link
Contributor Author

Thanks so much for the positive comments, glad you like the materials here - I'm definitely open to revisiting this as a PR of optional tasks and then making a playbook that calls all of them to do an all-in-one upgrade. Does that sound like a good plan - I could even put them in a sub-folder roles/splunk/tasks/kvstore/... if that would help to centralize this.

Do you happen to have a good map of those version transitions and what they entail as far as mongod version and engine? I'm having to go through the docs and switch back and forth between versions, as it's not clear what the approach should even be... Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants