Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove calls to update vertica #39

Merged
merged 17 commits into from
Aug 25, 2021
Merged

Conversation

spilchen
Copy link
Collaborator

  • removed calls to update_vertica in the install and uninstall reconcile. This significantly speeds up scale-up operations.
  • new package (atconf) to handle modifying admintools.conf for install/uninstall
  • added a method to copy a file into a pod. It is implemented as a cat exec call.
  • had to explicitly accept the EULA now that we don't call update_vertica
  • had to explicitly check the existence and permissions of some directories in /opt/vertica/config (share and logrotate)
  • we pass the license in during the create_db command now that we don't call update_vertica anymore
  • had to change the obfuscation code to check for --password rather than -w. I added code that did a call to 'test -w file', and the obfuscation code was adding '****' for the file name.
  • moved ipv6 check function to its own package so that I can use it in the atconf package
  • moved name of vertica server container to names package. This was necessary so that we can use the const in the new atconf package

@spilchen spilchen self-assigned this Aug 23, 2021
@@ -1,14 +1,4 @@
apiVersion: v1
kind: Event
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is been a while for me so could you tell me why we do not need this anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the install is so fast, we no longer need the Event for it. So all of the events for install and uninstall were removed.

rm -rf /home/dbadmin/logrotate

sudo mkdir -p /opt/vertica/config/licensing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we need sudo anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were over zealous with the sudo before and so it wasn't needed. The long term goal is to have no reliance on sudo as that will play nice when we support OpenShift. Removing calls to update_vertica was a key first step for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok now I get it thanks.

}
}

// AddHosts will had ips to an admintools.conf. New admintools.conf, stored in
Copy link
Collaborator

@ningdeng ningdeng Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor typo: had --> add.

I'm trying to get more details: is it only adding IPs to the hosts = entry in [Cluster] section?

Edit: nvm, found it's adding IPs to both hosts= entry and creating new compact node name entries in [Nodes] section.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does it check duplicate IPs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: nvm, upon looking at the called functions I think my questions are answered.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. We handle duplicate IPs by treating it as a no-op. I will update the comment here.

return nil
}

// removeNodes will remove the nodes section for the given set of IPs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this function is expected to remove only compat21 node entries, if yes, should we assert or check somewhere to make sure we do not touch regular database nodes, at least mention in the comment? I understand those regular database nodes will be removed and are supposed to be removed by db_remove_subcluster/db_remove_nodes before the uninstallation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. We only remove compat21 node entries. The AT -t db_remove_nodes will handle removing of regular database nodes. I will mention in a comment.

if !p.isPodRunning {
continue
}
_, _, err := pr.CopyToPod(ctx, p.name, names.ServerContainer, atConfTempFile, paths.AdminToolsConf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the copying succeed on some Pods but fail on other Pods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is possible. We make sure that all pods are running before we begin install/uninstall, but that just shortens the timing. Do you think it is necessary to have a step in the operator to ensure admintools.conf is the same on all of the pods? And copy if they are different. The same pod is always picked first, so it should be a well known location that has the correct version of admintools.conf.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is necessary to have a step in the operator to ensure admintools.conf is the same on all of the pods?
-- Having a step regularly checking admintools.conf on all Pods may be overkilling, as long as the initiator of AT commands has the latest admintools.conf it should be ok. Maybe we can keep as is at this time, if tests show any problem, then we can see what we can do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the details of err message, just want to mention that it might be helpful if we know on which nodes the copy succeeded and on which the copy failed, so if there's at least one copy succeeded, then that copy can be the source of latest admintools.conf, such information could be helpful to help debugging and fix admintools.conf

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ningdeng based on your comments here, I improved the distributeAdmintoolsConf function.

  • the first pod we copy to will always be the same. It is also the pod that we use for the base for any subsequent admintools.conf changes for install/uninstall
  • if the first pod copied was okay, copy to all other pods. We save the errors and do checking at the end so that we attempt to copy to all pods
  • if the copy failed at one of the pods, log an event saying that admintools.conf was partially copied.

You can see these changes in the commit c424386

@ningdeng
Copy link
Collaborator

This looks good to me overall, thanks!

// Copy the admintools.conf to the rest of the pods. We will do error
// checking at the end so that we try to copy it to each pod.
errs := []error{}
for _, p := range pf.Detail {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest improvements of copying admintools.conf change looks good to me. Thanks!

Just some minor notes as potential future improvements: I see this is a sequential operation as it uses a for-loop, starting from v11, AT uses a new file copying mechanism, which improves the performance of file distribution to the entire cluster a lot. So, just in case that in the future people suggest/request updating this copying to be a parallel operation on a very large cluster, AT distribute_config_files could be used to distribute the files. At this time, using CopyToPod is better IMO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks. This is good to know and I will keep it in mind.

@spilchen spilchen merged commit 027ff64 into vertica:main Aug 25, 2021
@spilchen spilchen deleted the remove-update-vertica branch August 25, 2021 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants